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. THE SIMULTANEOUS DISTRIBUTION OF 
CORRELATION COEFFICIENTS 


By SIR RONALD A. FISHER 
Division of Mathematical Statistics, C.S.I.R.O. 


SUMMARY. From any sample in ¢ variables an inter-related system of $¢(t—1) correlations rjj 
can be calculated. Their simultaneous distribution depends only on the corresponding system of true 
correlations in the hypothetical multinormal population sampled, and in the following paper is expressed 
explicitly in terms of them. 


1. Ifa sample of № be available from the simultaneous distribution of ¢ 
correlated variates, direct multiplication of the N independent elements of frequency 
gives the expression 


(2т)-#(о, т»... о) | pu | 3 


EAT sete 2(a,—p;)(@j— tj) is 1 
exp [-45( SEAM pes... ctm Ий р. 
da. . dar dag. . Atg datis. -Aey SESS (LD 


where S stands for summation over the N sets of observations; ту, 05, ..., т, for the 
standard deviations in the population sampled, |p;;| for the determinant of the popu- 
lation correlations, and př for the elements of the reciprocal of the matrix of 
population correlations p;;. 


From this primitive product, the simultaneous sampling distributions are to 
be inferred of (a) the sample means, 


= eo) 


1 
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of each variate, (b) of the estimates of the variances, 


1 
N-1 


82 = 


S(z—z) ; E) 
and finally, of the estimated coefficients of correlation 


— S(y,—2;)(v;—2;) 
8,;(N —1) 


and from this the marginal simultaneous distribution of the Tu. 


тӯ 


(4) 


This program was carried out in 1915 for the case of two variates, and the 
distribution of the estimate r in terms of the true value p only, has been available in 
practical use for many years. For more than two variables the problem seems to 
have been left in a quite incomplete form. 

As in the bivariate case it is sufficiently clear that the estimates s; and ry are 
distributed independently of the empirical means, V, in a distribution which does 
not involve the true means, д, so that on integrating out all means, the marginal 
distribution is the same as that conditional on specific values. Equally, the ((1— 1)/2 
statistics т will be distributed independently not only of the means, but also of 
the standard deviations. 


The factor representing the simultaneous distribution of the V; is easily found 


and removed by considering the transformation (rotation) of each variate effected by 
the orthogonal matrix G, 


where X; = GX; 

and X; stands for the veotor (Eiis Lia -s Bey) 

for if Хи = Sey) М = zV N 
the ¢ variables ба» 


will be distributed in exactly the same simultaneous distribution аз the original £ 
variables, so that the omission of this one factor leaves a product equivalent to that 
appropriate to a simple sample of (N—1) i.e., 


(27r) (оо... о) | py 3-0 


exp [- 5 (ren ше e. Ja Ese (5) 


in which s? has been substituted for 


yc; Ce- 
х (6) 
N—1 


and - 7588; for {$ (пе) —Nz 2A J 
and finally, dv for the element of volume in *(N —1) dimensions. 
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2. The second important step was taken by Wishart (1928) by expressing 
the element of volume in terms of the quantities 


bu = Вт 
Eg = $(@—)(а)—®,). 
Regarding these quadratie forms as coordinates, Wishart showed that 


ў qf GN—t1)/4 
с = 


where V stands for the determinant of the &;;. 


й = V -i-2)2 d£, ... d£, au) 


To obtain the distribution explicitly in terms of the s; and the т, by substi- 


tuting 
£g = (3—1) 1 .. (8) 
Ey = (N — Urgs;s;, ] 

we may note V = 82... 8(N—1)|ry| 


where |ry| stands for determinant of the 7;, and 


(5,1...) Eu. Su) ee = 2(s,. 1 i. (N— То”, 


бе» s) 3 

giving (N 1) 9 -DrD py END (enn 
N—3 N—i—2\, 9-9 Мо... 0 
E cone 


co [=P rete 


ye шс к б) 
ГА с; 


3. So far we have the direct result of substituting the s; and the r; in Wishart’s 


- distribution when the numerical factors are restored, which is not quite easy with 


the form as left by Wishart. Further simplification flows from the substitution 


щ= „Уря Пре ... (10) 
which leads to 
—t(t—1)/4 9—«N —3)/2 —}(N-1) А Эу 
T 3 N—1—2 ү а уға) ү PIÈ па) 
p | (pap p) 
2 2 


gr Т / 
ехр [55 +... Т)! 

(иу... щи... ОВ 

A З 
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m 
where Уз = тив 82) 


It may be observed that the range of s; is from 0 to co, that и, is s; multiplied 


by a positive constant, and that 


oo 
J i Г окр. 9st меча, du, ‚... (13) 


depends only on the coefficients Yj, and on N. 
Writing Py_o¥y) 


for this integral, the simultaneous distribution of the values ry, in number 
t(t—1)/2, is 
q- 74 ›—(3—3)/9 T —(N-1) 


ean loge, {== ! (ори. oo 70 


rg | 0 7179? p. (y) Mary) m 4) 


when the variation of s, 8, +++) 8, із eliminated. 


B y ruptis 
Denoting Ёз = NI 
and therefore : Yi = ypy 


the expression (14) may be written 


т—!@—1)4 3—(N—3)2 


a3 СҮ зү ЇЗ E 
(т. grey , 


(N—1—2)/2 


Iryl Ех (у) TI(dr;;) ... (15) 


2 


Geometrically, if Pij is the cosine of the angular distance between two points i 
and j on a hypersphere, then рі; is the cosine with reversed sign of the dihedral 


angle opposite to these, or to the cosine of the angular distance between correspon- 
ding points of the polar figure. 


Wilks (1944, р. 120) proposed that this elimination should be carried out by 
expanding the exponential as in (9) in powers of its exponent, a quadratic in з, and 
by eliminating these variates by integration. This, however, is not a feasible path 


` 
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4. Analytic notes. (a) Certain definite integrals. 

From equation (15) can be derived a number of cognate identities. If, for 
any values of N and t all the true correlations p;; are zero, it follows that all у are 
zero also, Consequently the determinant 


1021 
is replaced by unity, and the function Р by 


9tN-9)/2 ( m. )' 


then putting N=t+2 

we find the. generalised volume obtained by integration with respect to rą over all 
possible values i.e., over the closed region within which the determinant remains posi- 
tive, in the expression 


Diese eel 
2 2 QqU-DA ... (16) 


For any value of ¢ the number of dimensions of the generalised volume is 
101—1); the following table gives algebraic and numerical value for moderate values 
of t. > 


TABLE. GENERALISED VOLUME OF REGIONS OF 
INTEGRATION FOR CORRELATION 


COEFFICIENTS 

t dimensions generalised numerical 
$ volume | value 

2 1 2 2. 

3 3 72/2 4.9348 

4 6 3272/27 11.6973 _ 

5 10 376/128 22.5326 

6 15 933569155 31.114 

7 21 5712/3421! 27.858 

8 28 22412/345877 14.877 

9 36 5°720/3*2°° 4.4115 
10 45 2102013725778 ‚68297 


More generally if N = t+2+k and |rj| = D*, 


ky k+1, k+t—-1, 
DU E ES 
fie f D'dry,...dr 44 = ——À————————— тч — 4 ... (17) 
k+t—1 )' 
E ; 


giving the integrals of all powers of D, and the means of all powers over this region. 
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(b) Derivatives of Fy (у = 0). : 
The function Fy_(y;) when y; are all zero may be regarded as a special 
case of the more general function 


Таа е= "3/2 dat, Pug 2400 etl? dus... 


for all ¢ variables w; this product is easily evaluated as the product of 


g(N+s1—3)/2 е gN +s: - 3)/2 EAST A E218) 
Consequently, if аз; are any positive integers, or zero, and if 
La; = A; 
one 1 
80 that ЗА; —2X£ Xa; 
i i»j 
then the differential coefficient 
оу 
дузй y-aAYij = 
will be a(-4i-2)J2 N--4,—2 | мА, за N +4,—2 , (19) 
2 5 n 


80 giving completely the differential coefficients of F х-(Уу= 0) with respect to the 
(7—1) variables у. 
Tn particular the first differential coefficients are all equal, namely, 


N-—2, N-3, 2 
Ns fae (t-2)(N—9)/2 1) 
e geri Шор!) 


ог 1 2. Е 
Een 
Ж jN—2 | уз 
so that fos log = 2 (rcs : je all i,j. ... (20) 
нә 
0—2 | \4 
Similarly 2t log F —(N—1j—4 is : р 
2 
= 2 —9 
(2521 ) x ) в. (21) 
05,9; ч her y —4 [= } 
апа. 55:55, log F = 1 | х 
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where separate terms refer to different partitions of the aggregate a;. Log F is then 

expressible as a multiple power series in y;. In connection with these ratios of fac- 

torials, it may be useful to note that _ i 
ео 

may be expanded in the form 


a а с Е Р i en 
so that $t log вы 
e lg F~ N— З 
вто 8 (И) 


all being nearly equal, apart from a simple coefficient. 
- (©) Geometrical interpretation. 

The expression Z(y,;) has a rather simple geometrical interpretation. When, 
for example, t = 3 we may consider the transformation of the original coordinates 
т, v, w into new coordinates w’, v', w', according to homogeneous linear equations 

w = Ayud- tp vw ў 
v' = Agu-l-ugo-]- Vw 2 
w = Agu-- pao -- vgw. ў 


Тһеп шо = uà -]-v3--w3 — узо — узо — узо 

АРАА = 1, рая Изуз-Е gv = —Узз 

и-и = 1, Ау Азу Азув = — yas 

У-У У = 1, А Аи --АзИз = — Уз» 
and the bounding edge v = 0, w = 0 has direction cosines 

Ay Аз, As, 
and so with the others, so that the cosines of the angles between pairs of such edges 
are Шу изу ИзУв = —Yoa eto. 
ao = HAT 

Noting that I е rdr = 4/2 (4)! = МТ 
or for t dimensions 7 е "Виза = 20792 us 2 ... (23) 


it is seen that Р (уз) is for three dimensions, the area of the spherical triangle the 
sides of which have cosines (—y;), multiplied by 4/7/2, and for t dimensions the 
generalised volume of the corresponding hyperspherical figure, multiplied by 


aae, 12 | 
2 
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and in each case divided by 
д(ш, v', wu, v, w), 


or by the determinant 


А д У 
№ да. у 
Аз Из Уз 
of which the square is 
1 т 


| exe meni тү Л 1 
If therefore this is written D?, then in general 


Улз 1 —Узз | 


Еу) = 222 -— 15 es (24) 


where V is the generalised volume of the figure defined by —у;, and D is the 'genera- 
lised sine’ of the solid angle it subtends at the centre of the hypersphere, or the volume 
defined by the ¢ unit vectors. 


When ¢ is even F, (yy) can be derived from F, directly by differentiation 
with respect to chosen variables, e.g., for t = 4, as 
д? 
971207 34 
If t is odd, the suffix can be increased in this way only by steps of two; so when (=3 


Fora) s. (25) 


Р.(у;) = Ty) ... (26) 


дз 
O10 25051 
A general form for F,, when t is odd, is thus to be desired. In the case ¢ = 3, fairly 
direct integration may be used to show that, if cos 0, = уз, etc., 0 < 0 < 7, then 


1 0 
FD ра > Tn G, 7801 Vis) 9-00 Уша) +10 ¥s1 +7127 23)}- -.. (27) 
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RESOLVABLE INCOMPLETE BLOCK DESIGNS WITH 
TWO REPLICATIONS* 


By R. C. BOSE 
and 
K. R. NAIR** 
Institute of Statistics, University of North Carolina · 
SUMMARY. A class of resolvable incomplete block designs with two replications is given here. 
This includes the known two replicate designs, which are either the sample square or rectangular lattices 
and their extensions. The designs in this class are not necessarily PBIB designs but their duals are PBIB 
designs with three associate classes, Nevertheless the treatments in these designs have an association 
scheme with a maximum of seven associate classes. i 
A list of all the designs with k < 10 belonging to this class has beon prepared and methods of 


construction of designs explained. The necessary formulae for analysis with recovery of interblock 
information are derived. 


1. INTRODUCTION 
Incomplete block designs become most useful when the number of treatments 
or varieties is very large. In such cases it is seldom that the research worker will 
have experimental material for more than a small number of replications. It is there- 
fore of practical importance to make available to him a repertoire of designs involving 
as few replications as possible. 


For a systematic search for such designs, their construction and analysis, 
it is advisable to go step by step from designs requiring two replications to those re- 
quiring three replications, four replications, and so on. 


In this paper, we shall consider two-replicate designs only. There are two 
main groups of Incomplete Block designs already investigated, namely, the Balanced 
Incomplete Block designs due to Yates (1936) and the Partially Balanced Incomplete 
Block (PBIB) designs developed by the authors (1939) and whose definition was 
slightly modified by Nair and Rao (1942). 

In virtue of the fact that the number of replications of each treatment in a 
Balanced Incomplete Block design cannot be less than the number of plots per block 
and since the latter cannot be less than two, the search for two-replicate Balanced 
Incomplete Block designs has to be confined to the case where there are two plots 
per block. There is only one set of such balanced designs, namely, the one where 
every pair of treatments is made to constitute one block. In the familiar notation : 

A(v—1) = (0—1) у 
the only two-replicate balanced design we can construct is for the trivial case, 
v= 3, k = 2, r= 2, b = 3, А = 1. 

Since Partially Balanced Incomplete Block designs are free from the res- 
triction r > k a number of PBIB designs can be constructed for which r = 2, Bose 
(1951) made an exhaustive study of two-replicate PBIB designs having two associate 
classes. Nair (1950, 1951) gave some examples of two-replicate PBIB designs 


*Special report submitted in 1952 to The United States Air Force under contract AF 18(600)-83 
monitored by the Office of Scientific Research, 
**Present address: Central Statistical Organisation, Government of India, New Delhi. 
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involving three or four associate classes. Among the designs considered by them 
there is а sub-class where the blocks can be arranged in two separate replication groups. 
This is à feature which, if taken advantage of while laying out the experiment, can 
help in recovering inter-block information, and in thereby improving the efficiency 
of the design. Such designs will be called resolvable designs. 

The only resolvable two-replicate PBIB designs found so far are basically 
the simple square and rectangular lattices involving р? and p(p—1) treatments res- 
pectively in blocks of p and (p—1) plots. The former involves two and the latter four 
associate classes of treatment comparisons. By replacing each treatment by a set 
of q treatments in each of these lattices we can evolve two-replicate PBIB designs 
with three and five associate classes for p*g and p(p—1)q treatments in blocks of pd 
and (p—1)g treatments respectively. 

Tn this paper we shall evolve a new class of resolvable Incomplete Block designs 
having ошу two replications. In general, these designs have seven associate classes 
of treatment comparisons, but in particular cases they may be less than seven. This 
new class of designs does not in general belong to the category of PBIB designs defined 
by Bose, Nair and Rao, but the dual designs of this class, i.e., the designs obtained 
by interchanging treatments and blocks, are PBIB designs with three associate classes. 
The four PBIB designs mentioned in the last paragraph are special cases of this new 
class of designs. 

The method of analysis of the new class of designs with recovery of interblock 
information is simple and straightforward. The necessary formulae have been derived. 

A list of all the designs with Æ < 10 has been prepared and their methods of 
construction explained. 

j 2. THE DESIGN 
(1) Consider the ux v incidence matrix ofa Symmetrical Balanced Incomplete 


Block design where и treatments are replicated ғ times so that there are и blocks of 
r plots each and ~ 


A(u—1) = r(r—1). $oxea0) 
Let the matrix be denoted by 
N = (nj) ... (2.2) 


where 7;;, the element at the junction of i-th row and j-th column, is either 0 or 1. 
The following conditions are satisfied by the elements of matrix (2.2) 


u u 

Уп, = т? = j= s 

Mee =F (j= 1, 2, ..., ш) 3 (23) 
È Èn? ; 

EI атл (i = 1,2,...,u) note (2.4) 
и 

2 rn A (4 Ati =1, 2, ..., u) ... (2.5) 
и 

È тп = A (GG #h=1,2,...,u). ... (2.6) 
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(2) In the matrix (2.2) we shall replace every m; having value 1 by a set of 
р treatments and every n; having value 0 by a set of q treatments. No treatment 
is repeated inside the matrix. The total number of treatments in any row or column 
of the matrix will be 
k=rp+(u—r)g ` oa 207) 
v = ku. (02:8) 

То construet the design we take all the Ё treatments in ће same row of the 
matrix to form a single block and thus build up v blocks from the u rows to form one 
replication. We do likewise with the Ё treatments in the same column of the matrix 
and form и blocks of a second replication from the u columns. The design will there- 
fore have two resolvable replications with 2u blocks of k plots each. 


and in the whole matrix : 


(3) We will consider a few examples at this stage. 
Example 1. Let u= 3, т = 2. Then А = 1 and 


1 И 0 
N= 0 1 1 
y 0 1 


Let p = 2, q= 1. Representing the /-th treatment in the cell (ij) by th, 
the scheme of the design is : 


К Шш саке Т Сы ые С + 
th ty fe tie hs 
ty |82 8 | tae б 
“їй 0 tga ts № 
Re-identifying the treatments and calling them 1, 2, ..., 15, we could write 
the scheme in a simpler way as 


11,12 13 1445 


The design is then : 


replication I replication II 

block treatment number block treatment number 
number : number 

(1) т (4) 1 2 6 11 12 
(2) бг 2:58: 9. 10 (5) ERRAT gis ies xe 1 
(3) 11 12 13 14 15 (6) 5 9 10 14 в 


pou ——————— €— 
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We thus have a resolvable two-replicate design for 15 treatments in six blocks 
of 5 plots each. 


Example 2. Let и = ғ. In other words we are dealing with symmetrical 
complete blocks, so that the n;; in the incidence matrix is 1 in every cell and A = r. 
Putting р = 1, we get the simple square lattice for v = u?, k = и. When р> 1, we 
get the extended square lattice design : v = u*p, k = up. 

Example 3. Let и = r--l. Then A—r—1. By putting p — 1, @=0, 
we get the simple rectangular lattice for v = и(и—1), k = (u—1). If p>1l1,q=0, 
we get the extended rectangular lattice design : v = u(u—1)p, k = (u—1)p. 


3. ESTIMATION OF TREATMENT EFFEOTS 


(1) Let us denote the m;; treatments of cell (ij) by (1), (2), ..., (ть) 
and their effects by 
ap, 1), ey nip), 


where m;; = p or q according as n; = 1 or 0. 


u gj 
Let ROSE EY $: (8:1) 
jel del 
= Sum of the treatment effects for row i 
cep-i 5 3.2) 
7 TAE, ЩЕ 


= Sum of the treatment effects for column j. 


For the sake of unique estimation of the effects, we shall make the usual 
assumption that У У У = 0, or, that 


È Rit) = X Ой) = о. О (8.3) 


For treatment ij(l), let x? be the observed value of the character under study 
in the first replication and УФ that in the second replication. We shall call these 
replications x- and y-replications. 


ш Mij 
Let Riz) = X af ... (34) 
ј=1 1-1 
и mij 
Ca) = À Xa. 32 TERI 
Let Т) = X В) = X Oa) 
i=l j=1 
Ry) = XY ур 
W= 27 и .. (3.6) 
O-— XY yu. -— 
Gly) = rid yp. wy (83.7) 
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Let Т(у) = XR(y) = ХСду). 

It will be noted that R,(x) is the total value of v for the i-th block of the 
x-replication and that C;(y) is the total value of y for the j-th block of the 
y-replication. 

Let Qf) be the total observed value of the character under study for thejtwo 


replications of treatment ij(l) minus the sum of the corresponding block means. 
Hence, 


OP = ар Бир — ВО). ^ (8) 
Let us introduce a quantity Q; defined by 
Qj = видно, (eT). oe 89) 
Let R(Q-X29p;RQ)2 * op. „ (3.10) 
jel lel j=l 1=1 
оке) = 7 ap; =È op. (811) 
i=] 1=1 i=] del 


(2) The normal equation corresponding to the treatment ij(l) may be written 
ae 
k(wQQ --w'Qi?y = Зи —(w—w' {R,(t)-+-C;(t)} ... (3.12) 
where w and и’ are reciprocals of the intra- and inter-block variance per plot. 


Summing up over j апа? 


Ko R(Q)--w ВО} = Мои )В) (0—0) 3 E EO 
Summing up over i and J 
KWO Q) HW OQ) = Кизи Oww) X тув)... (8140 
Substituting for Оу) from (3.14) in (3.13) and putting 
lw e ... (3.15) 
we get, O4 Ry(t)-4- Cis Ra(t)4-...-- Ci, Bult) = k(wR(Q)--w' R(Q')) ... (3.16) 


y 3 my eo ero y 


where Ou = k(w-Ew') {1- ү (m 4-m&-- ... nk) } ЖО 397) 
2 
Си = =F бои Jmm тата -+ тат) ... (3.18) 
ih. 
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Scince mj; = p or q according as n;; = 1 or 0, (3.17) and (3.18) reduce to 


Cu = Kw+w') [1— (Y mte] ... (3.19) 


On = — Y (ww \Ap*+2r—A)pgq-+(u—2r-+A)g}. _... (зо) 
Using the condition (3.3) we can simplity (3.16) to the form 
(Cu On) ВАО = Мове) RQ) +Y  mjC(Q)-wO Q^)... (3.21) 
Similarly, we got 
(04—04)6,0) = bul (Q)4-w'0,^) y E mywR(Q)+w' RQ’). ... (3.22) 
The value of (Он—Си) occurring in (3.21) and (3.22) can be simplified to the form 
бб F (o--w (E —(r— AXp—g*ys. ... (3.23) 
(3) Substituting (3.21) and (3.22) in (3.12), we have 


tf = э (OP +! Qi) + aL [(e(Q)--w BíQ')--(00,)4-w C (Q^) ] 
ti [ тевд) wro È mabeC(Q)--w'OqQ)] .. (8.24) 
where, fe = B—(r—A)(p—qyy?. .:. (3.25) 
By putting w’ = 0 in (3.24) and (3.25) we can obtain the intra-block estimate 
of the treatment effect, namely, 

HRQ) +O} È mjR(Q)4- $ тс) 

- (0 — 1 00 25 jc 

LAE Бк шуут иш 


* 


(3.26) 


4. VARIANCES OF ESTIMATED TREATMENT DIFFERENCES 


_ 1. We are interested in comparing the effects, as estimated by (3.24), of 
any two of the v treatments. "There are seven types of treatment pairs each having 
a different expression for the variance of a difference between the two members of the 
pair. They are separately considered below. 


(1) For two treatments belonging to the same cell of the uxu matrix, the 
variance of the difference of their effects is 1/0. 
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(2) For two treatments belonging to different cells of the same row or column 
ky 

и ^ 
(3) Two treatments not belonging to the same row or column of the uxu 
matrix fall into five sub-classes. Let us denote the cells to which these treatments 
belong by (gh) and (ij) where g 52 i and h 5 ј. We know that there ате two types of 
cells, namely, those having p treatments and the others having q treatments. If we 
take the quadrant formed by the four cells : 


(gh) (90) 

Gh) (0) 
it is clear that there are 16 types of quadrants. They fall into five groups according 
as the value of the cross difference 


of the uxu matrix, the variance of the difference in effects is ( 1+ 


а = тт т ... (4.1) 
is either 0, (р—9), (4—2), 2(p—4) ог 2(g—p). 
(i) 4=0. There are six quadrants in this group, namely, 
hj В) hj- hj Wi hg 


$ 20199 99129 d p.n 


Variance of the difference between the effects of a treatment in cell (gh) and 
a ante 2ky 
in cell (ij) is —- а : 
(ii) 4=(р—9). There are four quadrants in this group, namely 


GPa Pda = 65-9: 0. 


i D рр 4d фр 


Variance of the difference between the effects of a treatment in cell (gh) and in cell 
ann 2ky--(p—2)y* 
ü) ів 2. [ 1+ г | 


(iii) d= (q—p). There are four quadrants in this group, namely, 


Я ро Ч po po 


2999 рф -p.d 


э, 2 
Variance in this case is zl ІВ ру" |: 
w Hh 
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(iv) d = 2(p—q). There is only one quadrant in this group, namely, 


РЧ 


ЧР 


Variance is 1 | 1+ АР |, 
w n 


(v) d= 2(q—p). There is only one quadrant in this group, namely, 


ЧР 


29 


Variance is zl 1, 25 -3(0—p)y* 1. 
2 р 


4 


The variances in the cases (i) to (v) may be denoted by the common 
expression 


i ri 4 2ky+dy? ] 
ws Г 
where d = 0, (p—4), ({—р), 2(p—4), 2(g—p) respectively. 

2. Ofthe total number of T ?(v—1) differences between pairs of treat- 
ments, the number belonging to type (1) is : 


nro —1)-- (ing). 
The number belonging to type (2) is: 


ufrp(k—p)+(u—r)g(k—q)}. 
"The number belonging to type (3), sub-types (i) to (v) combined is : 


5 Mrpluk—2k-+-p)-+(u—r)q(uk—2k-+q)}. 


It is not easy in the general case to split up the last number into those belong- 
ing to each of the sub-types (i) to (v) and hence to calculate the mean variance of all 
comparisons. For particular designs the number of comparisons of each of the sub- 
types (i) to (v) can be determined and the mean variance calculated. 


3. Special cases. (a) When и — 3, r becomes 2 and the values of т; in 
the final scheme for the design : v = 3k, k = 2р--9 are distributed in the form 


Дүрс 
1 РР 
Dpoq p 


16 
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In this case quadrants with d = 0 do not occur. Hence the number of 
different types of comparisons reduces to six. 

Example 1 of Section 2 is a particular case when р = 2,4 = 1. 

(b) When р > 1, g = 0, comparisons of the sub-types (iii) and (v) do not 
occur. Hence the number of different types of comparisons reduces to five. 

(c) When р = 0, q > 1, comparisons of the sub-types (ii) апа (iv) do not 
occur. Hence, as in (b), the number of different types of comparisons reduces to five. 

(d) When p = 1, = 0 or p = 0, 4 = 1 there are only four types of compari- 
sons. The corresponding variances are 


xr) и" ena 


where k — ог u—r and u = K*—(r—A)y?. 

(е) If, in (b) w=r+l so that A-—r—1, we have the extended 
rectangular lattice : v = wu(u—1)p, k = (u—1)p having five variances for treatment 
comparisons. 

(f) If, in (d), u — r--1 so that А = r—1, we have the simple rectangular 
lattice: v = u(u— 1), k = (u—1). Here, и = k?—y? in the expressions for the four 
variances. 

(g) -When-p = q, which is the same as the case и = r = A the sub-types 
(i) to (v) merge into a single type of comparison and hence there are only three different 
variances instead of seven, They are: 


1 y 1 2y 
w’ v II ^e +). 
The design is now the extended square lattice: v = wp, = up. 
When p=q=1, there will be only two different variances, namely, 


A {1 ВБА 2) апа — A EF a The, design becomes the- simple square lattice : 
w . 
v 


Б. ESTIMATION OF w AND w' 


1. The values of w and w' entering into the estimates of treatment effects 
given by (3.24) and in the variances of the differences of these. estimates calculated in 
Seċtion 4 are not known a priori and have to be estimated from the data. This is 
done with the aid of the analysis of variance given in Table 1. 


2. The following sums of squares must be calculated first 


Total 8.8.— E E ни СОТО... (gu) 
i=1 j=1 1 2v 
Replication S.S. — cur B ... (8.2) 
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Blocks 8.8. 2 a $ {Вх (y |- 1 tre» что. s (63) 
(unadjusted) k lizi = 2 

Treatments S.S. = 1 X E Y fat + py — £O TU я ... (6.4) 
(unadjusted) 2 jor jad bel P 


3. The treatment sum of squares (adjusted) is calculated using the formula 


SSY р. QD ... (5.5) 


i=l j=l 1=1 


where #2 is the intra-block estimate given in 3.26. 


TABLE 1. ANALYSIS OF VARIANCE 


source of degrees of sum of squares mean 
variation freedom square 
replication 1 see (5.2) 

- blocks within 
replications 
(unadjusted) 2(u—1) see (5.3) 
treatments (adjusted) (0—1) seo (5.5) Е; 
intra-block error v—2u+1 by subtraction Е, 
total 2v—1 see (5.1) 
blocks within 
replications (adjusted) 2(u—1) (5.3) 4- (5.5)—(5.4) Ey 


4. The Blocks Sum of Squares within replications (adjusted) is obtained 
by using the relation 


Block (adj.)--Treatment (unadj.) = Block (unadj.)-+'Treatment(adj.). 


5. A test of significance of the differences among the treatment effects 
(intra-block) can be performed by the F-test where 


Е = ЕЕ, 
with degrees of freedom (0-1) and (v—2u-+-1). 
6. Using formulae given by Nair (1944) the estimates of w and w’ are as 


follows : 

NUUS mx $ 30s a (5.6) 
Hence, = = tm, | (6:9) 
and - у= P „2 (8) 
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RESOLVABLE INCOMPLETE BLOCK DESIGNS WITH TWO REPLICATIONS 


6. LisT OF DESIGNS 


Let us list all the designs for which k < 10. 


к 


k 


no. 


4 SS.L.* 
8 ES.L.* 


12 E.S.L. 
16 E.S.L. 
20 E.S.L. 


2 
4 
6 
8 


6 S.R.L.* 
9 S.S.L. 


12 
15 
18 
21 


2 
3 


10 
11 
12 
18 
14 
15 
16 
17 
18 


24 
27 
30 


10 


12 E.R.L.* 
15 


4 


18 ES.L. 
21 


6 


24 
27 


19 - 
20 


30 


10 


21 


18 E.R.L. 
21 


6 


22 
23 


24 


24 
25 
26 


27 ESL. 


30 


9 
10 


24 E.R.L. 
21 


8 


P 


28 
29 
30 


30 
` 80 E. R.L. 


10 


See footnote on the last page of the list 
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< 
E 
Е 
a 
4 
8 
Ё 
E 
Е 
5 
S 
E 
e 
E 
; 


no. 


12 S:R:L. 


3 
4 


3 


31 


S.S.L. 


16 


4 


20 
24 


` 34 
` 85 
- 36 
237 
. 38 


32 


36 


40 


10 
6 


24 E.R.L. 


28 


39 
40 


41 


32 ES.L. 
36 
40 


8 


42 


10 
9 
10 
4 
5 


43 


36 E.R.L. 


40 
-20 S.R.L. 


44 
45 
46 


25 S.S.L. 


30 
35° 


47 


48 


49 


40 
45 


-50 
E 


50 


10 


52 
- 53 


40 E.R.L. 
45 


8 


54 
55 


50 E.S.L. 


30 S.R.L. 


5 
6 


56 
57 


36 5.5.1. 


42 


58 
‚ 59 


48 


60 — 
61 


54 
60 


10 


60 E.R.L. 
42 S.R.L. 


49 S.S.L. 


56 


62 
"3: 


6 
7 


64 
. 65. 


63 


66 


70 


67 


56 S.R.L. 
64 S.S.L. 


7 
8 


68 
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no. AL a А p q k v 


BS Oe Qtr TO re aa 
71 8 7j Са Meee сы мут азы: 
аи opo C SCR SENE ES 0— 72 S.R.L 
73 9 8 1 1 1 9- 5-8 
14 9 8 1 1 2 10 90 
75 10 9 8 1 0 9 90 SRL 
в 10 9 8 1 1 10 100 8.81, 
Тт еба, C0 9 1 0 10 110 SRL. 
78 7 3 TET 0 32377221 
M emot QT 3 1 0 1 IE ERIB 
NE EIE 7 3 1 2 0 ЕТ 
81 7 3 1 0 3 8 56 
82 7 3 1 2 1 5104-170 
i; ри 1 1 0 52 
88 18 4 1 0 1 9 117 { 
ЫЕ ПЕЕ 4 1 2 Onesie, 102 
A EU RE. o] 5 1 1 0 5 105 
STE DOT 5 1 2 0 10 210 
аи 6 1 1 0 6 186 
ВО 2-57 8 1 1 0 8 456 
90 73 9 1 1 0 9 657 
a реа —:-—-——0-— 7-10: 10 
PIU 5 2 1 0 БЕЗЕ BOE 
93: SETS 2 0 1 6 66 
94 1] 5 2 2 0: 1530: ло 
95 16 6 2 1 0 6 . 96 
96 16 6 2 0 1 10 160 
Өн ыла. 9 g 1 0 9 333 B reau cf tant. & Psyl. Res 
98 15 1 3 1 0 7 105 (4S СЕВ T.) 
ЮО x ur E. 
100 25 9 3 1 0 9 225 | ufa 
MPO IO a LE QUE STO и 
102 19 9 4 1 0 9 171 D 
103 19 9 4 0 1 10 190 


3.3.1. — Simple Square Lattice [#=и2, k=u, r—2, b=2u] 
S.R.L.=Simple Rectangular Lattice [p —u(u— 1), k=(u—1), r=2, b—2u] 
£.S.L.=Extended Square Lattice [v—u2p, k —up, r=2, b—2u] 


E.R.L. —Extended Rectangular Lattice [v—w(u—1)p, k— (u—1)p, r=2, b—2u] et 
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We have 103 designs for Ё < 10 out of which 9 each are simple square lattices 


and simple rectangular lattices. If arranged according to magnitude of v and k, 
we have the following scheme 


j 
v= 4 6 8 9 12 15 16 18 20 21 24 25 27 28 30 32 35 36 40 


E OA 26-6 428 EDS E 7.6 в 
И 2677-9 7 ВВ EP 8 
отв: 104^ 7 8 OP 30-10) 5 7 9 10 
6 u$» e 8 9: 77-10 9 10 
E Уа. 2 10 ЕАО 
8 10 E. 

т 10 


pz 42 45 48 49 50 52 54 55 56 60 63 64 06 70 72 80 81 


——— ——— — 


тс овом 9: 8/8 з 8 I0 о 
6 9 Зв, 0 9 
(Жее E Sc 
CODEC Uy ты оо = UU UST NN 
— MM ——. 
v= 90 96 100 104 105 110 117 120 160 171 186 190 210 225 310 333 
k= oaoa ОТООР 8 IUIS 6-10 10 9 10 9 
АИ Таа: la н — 
oS) ыы ике ы ЕЕ ВЕ L 
с == MEMBRE? ЖИНДИ 
v= 456 657 910 
k= 8 9 10 


2. The simple and extended square lattices and the simple and extended 
rectangular lattices are PBIB designs with two, three, four and five associate classes 
respectively. The number of different variances to be calculated for treatment 
differences in these cases is 2, 3, 4 and 5 respectively. 


Out of the remaining designs listed above we have found that those designs 
derived from the symmetrical balanced incomplete block design 
u = 8°--в--1 
т = 8-1 
У 
and for which p > 0 and 9 = 0, so that 
k = p(s-+1) 
v = p(s+1)(s?+8+1) 
are PBIB designs with four associate classes, 
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The parameters are : 


v = p(s+1)(s?+s8+1) p 
Е =p(s+1) b = 2(82-- 8-1) 
a2 Ax cut Аз АО 
m =(р—1) ng = 2рв пз = 2ps* т. = ps? 
(p—2) О Кз 0 
0 2рв 0 0 
ра = ( 0 0 2ps? 0 
0 0 0 28% 
0 (p—1) 0 0 
ges ( (0—1) (8—1) ps 0 
2 0 ps pss—1) pe 
0 0 ps? ps*(s— 1) 
0 0 (2—1) 0 
Th 0 p 2(8—1) ps 
(p—1) (8—1) ps 2ps(s—1) 
0 ps 2ps(s— 1) ps(s—1)* 
0 0 0 (p—1) 
^ 0 0 2p 2p(s—1) 
P= о 2p 4p(s—1) 2p(s—1)? 


(p—1) ` 2р(8—1) 2р(8—1)1 2(8—1)(8*—8--1) 


If p = 1, the design becomes а РВІВ design with three associate classes. The 
parameters ате: 


v = (s--1)(s?4-s4-1) r=2 


k = (8-1) b = 2(82--8-|-1) 
A-1 №=0 Аз = 
ту = 28 та = 287 fij = 8% 
(s—1) 8 0 1 (8—1) 8 
ph=( 5 №0 а | »5-|-D s 288-10 
0 a №5 s 26-0  s(s—1) 
0 2 2(s—1) 
Dh 2 4(s—1) 2(s—1)? 


2(s—1) 2(s—1)? (s—1)(s?—s-+-1) 
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3. Among the remaining designs it is likely that some are PBIB designs, 
For instance, design No. 79 is a PBIB design with the following parameters. 


v= 28 k= к= 2 b —14 

m= одо Ad до 

m= 6 n= m%=12 m=6 
See ч S iy, Cog >. 0 
10 2g iuf - 0004505. $ 
e 2 iq Ud а. 
099.0554 5 nura dor 10 
Tee n iaa быз 4 ig 
отл че ео 
а "qu эл, acu) RET DES Ао 
а NP A Sero ар 


It follows from this that design No. 81 is a PBIB design with five associate 
classes. 


4. Designs with both p and 4 unequal and different from zero will not be 
PBIB designs. 


REFERENCES 


Вовк, R. C. and Nai, К. В. (1939) : Partially balanced incomplete block designs. Sankhyd, 4, 337-372. 


Вовк, R. С, (1951) : Partially balanced incomplete block designs with two associate classes involving 
' only two replications. Cal. Stat, Ass, Bull., 8, 120-125. 


Nam, К. В. and Rao, С. В. ( 1942): A note on partially balanced incomplete block designs. Science 
and Culture, 7, 568-569. 


Nam, К. Б. (1944) : The recovery of inter-block information in incomplete block designs. Sankhya, 6, 
383-390. 


—— — (1950): Partially balanced incomplete block designs involving only two replications. Cal. 
Stat. Ass, Bull, 3, 83-86. = 


(1951): Some two-replicate partially balanced designs. Cal. Stat. Ass, Bull., 8, 174-176. 


Yares, Е. (1936) : Incomplete randomized blocks. Ann. Eugen., 7, 121-140. 


Paper received + November, 1961. 


24 


ON A METHOD OF CONSTRUCTING SYMMETRICAL 
BALANCED INCOMPLETE BLOCK DESIGNS 


By б. S. SHRIKHANDE 
and _ 
N. K. SINGH 
Banaras Hindu University 
SUMMARY. A new method for the construction of Symmetrical Balanced Incomplete Block 


designs is indicated which makés use of the existence of certain association schemes useful in Partially 
Balanced Incomplete Block designs, Some related problems are also considered. 


4 1. INTRODUCTION 


А Balanced Incomplete Block Design (BIBD) is an arrangement of v elements 
(treatments) into b sets (blocks) of k(<v) distinct elements such that every pair of 
different elements occurs in exactly A blocks. Then it is easy to see that every ele- 
ment occurs in exactly r of these sets. The numbers v, Б, 7, k, A are called the para- 
meters of the design and they satisfy the relations j 


% vr = bk, Av—1) = r(k—1), b > v. SEU) 
The last inequality is due to Fisher (1940). A BIBD is called symmetrical if v = b 
and hence r = k. Systematic methods of construction for BIBD have been given 
by various authors, for example, Bose (1939), Rao (1945, 1946) Sprott (1954), Bose 
and Shrikhande (1960), Haim Hanani (1961). 

Following Bose and Shimamoto (1952) we say that an m associate classes 
association scheme for v treatments is defined if the following conditions are 
-satisfied. 


(a) Any two treatments are either first associates, or second associates... 
or m-th associates. 

(b) Each treatment has т; i-th associates, i = 1, 2..., m. 

(с) Given any two treatments which are i-th associates, the number Pj, of 
treatments which are j-th associates of the first and k-th associates of the second is 
independent of the pair of treatments with which we start. Further, pj, = pijp 
for i, 9, k = 1,2, ...,m. 

A Partially Balanced Incomplete Block Design (PBIBD) is an arrangement 
of v treatments in 6 blocks such that 

(1) each of the v treatments is replicated r times in b blocks of size k and no 
treatment occurs more than once in any block; 

(2) there exists an m (> 2) associate classes association scheme for v treat- 
ments as defined above; 

(3) any pair of treatments which are i-th associates occur together in exactly 
А; blocks for $ = 1, 2, ..., m, where all the A;'s are not equal. 
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The definition of PBIBD as given above is essentially due to Bose and Nair 
(1939) with a slight modification by Nair and Rao (1942). The numbers v, b, r, Ё, 
Ay «+ Àm are called parameters of the first kind, whereas the numbers 7, Ma, ..., Ny, 
and pj, are called the parameters of the second kind. It is possible for different 
PBIB designs to have the same association scheme but different parameters of the 
first kind. Тһе important idea of separating a PBIBD from its association scheme is 
due to Bose and Shimamoto as mentioned earlier. A crucial role in the existence of 
of some symmetrical BIB designs is played by certain types of association schemes and 
PBIB designs are used only in so far as they provide the existence of these association 
schemes, : 

2. CONSTRUCTION Or SYMMETRICAL BIB DESIGNS 

Consider a т classes association scheme for v treatments with parameters 

т, Ph, %,),Ё = 1,2,..., m. The following relations are known to exist 


У n; = 0—1 (2.1) 
i=l 

ro т if 5) 

E p = y . б 

x n;—l ifi =). (2.2) 


We write the parameters Pj, in the form of m matrices Р; — (pj), 1, j, k = 1, 2, 
++), which are necessarily symmetrical. In view of (2.2), the sum of the elements 
in the j-th row of P, is n; if j 4 i and n—1 ifj= i. 

Nowsuppose that the matrices Py Ps, ..., Ри, satisfy the condition that the sum 
of the elements in the square submatrices in each corresponding to rows and columns 
numbered i, ig, ..., ip р < misa constant, say, A. Define two treatments to be 
a-th associates if they are either ùth, 4th, ..., ip-th associates; otherwise define them 
as f-th associates. Obviously, then na and mg which are the numbers of a-th and 
#-th associates of a treatment are given by, 

п. = MEM bm, №. (2.8) 
ng = (v—1)—n,. 

If two treatments @ and $ are a-th associates the number Рас Of treatments 
which are simultaneously the a-th associates of 0 and ф is the number of treatments 
common to «-th associates of @ and Ф. If@ and ¢ are i-th associates, this number is 
easily seen to be the sum of the elements in the square submatrix of P; in rows and 
columns corresponding to i,, is, ..., i, and hence is equal to A. Similarly this number 
is A if 0 and ¢ are fth, i-th, ... or ijth associates. Hence Paa = A irrespective 
of what two treatments we start with so long as they are a-th associates. А similar 
argument shows that pP = A. for any two treatments which are fth associates. It 
now follows from Bose and Clatworthy (1955), that we get a two associate classes 
association scheme with parameters na, ng and Pixs 5j, k = о, В, such that 


A n,—1—A АА 
=: : | »-(. ) 2 (2.4) 
Ng— Ng +A+1 ^ —n,+A—1 


We can, therefore, state the following theorem. 
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Theorem 1: If we have an m associate classes association scheme with parameters 
т; and matrices P;, à = 1, 2,...,m, such that the sum of the elements in these matrices 
standing in rows and columns numbered i, is, ..., Ùp is а constant A, then we can form a 
two associate classes association scheme with parameters n,, ng and matrices Pa and 
Pg given by (2.3) and (2.4). 

Theorem 2: Jf there exists a two classes association scheme with parameters 
74, п, and matrices P, and P, such that either 

G) ph = Ph =A or (Ш) pho = Pig = A 
then we can construct a symmetrical BIBD with parameters v, т = n4, A— A or v, r = Ng, 
А = A according as condition (i) or (ii) is satisfied. 

Proof: Suppose condition (i) is satisfied. Consider the v blocks formed by 
first-associates of each of the v treatments. In virtue of (i), any two of these blocks 
intersect in exactly A treatments. We thus have v blocks of size k = n, involving 
v treatments. Since each treatment has exactly n, first associates, each treatment 
occurs in exactly r = n, of these blocks. It then follows from results of Chowla and 
Ryser (1950), that we have a symmetric BIBD with parameters v, r = m, A= А. 
Тї the same manner if conditionn (ii) is satisfied we get a symmetrical BIBD with 
‘parameters v,” = ng, А = А. 

Corollary 1: Suppose there exists а BIBD with А = 1 and т = 2k-+-1, then 
we can construct a symmetrical BIBD with parameters 

v= 4—1, r= 28, А = k я ... (2.6) 
or equivalently a Hadamard matrix of order 4/2. 
Proof: Using the conditions (1.1), it is easy to verify that the parameters 
of the given BIBD are 
v = k(2k—1) Ь=4#—1, r= 284-1, k=k, А=1. 
As shown by Shrikhande (1952), the dual of this design is а PBIBD with two associate 
classes with parameters 
v = 402—1, т = 218, pij = pii = P. 
The result now follows from Theorem 2, coupled with the equivalence of a BIBD 
with parameters (2.5) and a Hadamard matrix of order 4k? (Todd, 1933). 

Example 1: BIB designs of the above corollary are known to exist for 
k = 2,3, 4, 5,6,7 from Fisher and Yates's Tables (1949) and Rao (1961). This 
guarantees the existence of BIB designs with parameters (2.5) corresponding to v — 15, 
35, 63, 99, 143 апа 195. 

Corollary 2:  Ewistence of k—2 mutually orthogonal latin squares (m.o.l.s.) of 
order 2k implies the existence of symmetrical BIBD with parameters А 

о = 42, т = k(25—1), A = k(k—1) ра) 

Proof: From Bose and Shimamoto (1952) the existence of 2—2 m.o.l.s. 
of order 2k implies the existence of a two associate classes association scheme with 
v = 442, n, = k(2k—1), A = k(k—1). The result now follows from the above theorem. 
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Example 2: Taking k = 2, 3, 4 the conditions of the corollary are obviously 
satisfied. For k = 6, the existence of 5 m.o.ls. is guaranteed by Bose, Chakravarti 
and Knuth (1960). Hence a BIBD with parameters (2.6) exists for those values of kX. 

It is conjectured by Dulmage, Johnson and Mendelsohn (1961) that 2p—1 
m.0.l.s. of order 4p exist if p is a prime. Ifthis conjecture is true, it will guarantee 
the existence of the symmetrical BIBD with parameters 

v = 16р%, r = 2р(4р—1), А = 2p(2p—1) 
for every prime p. 
Corollary 3: If з = 2", then the symmetrical BIBD with parameters 
v = 884-2), r= 8(8--1), А = ғ 
exists. 
Proof: If s = 2". from the results of Ray Chaudhuri (1959) it is known that 
à PBIBD with two associate classes exists with the following parameters 
v = 4, b = st(s--2), т = 8-Е2,& = 8, № № =0 
ту = (8+2)(s—1), pl, = 8—2, pi = 84-2. 
This design belongs to the series given by Bose and Clatworthy (1955) with t= 1 in 
their notation and its dual is another PBIBD with two associate classes and para- 
meters ) 
0 = 848-2), b= ss г, k=s8+2, A,=1, А = 0, 
т = 6(8+1), pl, = pi =з. 
The result now follows from the above theorem. 

Example 3: Taking s = 2, 4,8 we get designs with parameters (16, 6, 2 
(96, 20, 4) and (640, 72,8). The design (96, 20, 4) is mentioned as an unsolved case 
for difference sets in Marshall Hall (1956). 

Example 4: Consider the association scheme for the design sl. 14 given 
by Bose, Clatworthy, Shrikhande (1954) having the parameters v — 45, n, — 12, 


fi = р = 3. This gives the symmetrical design with parameters (45, 12, 3) which 
is left as an unsolved case by Rao (1961). 


— ON A METHOD OF CONSTRUCTING SYMMETRICAL BIBD 


3. IMPOSSIBILITY OF CERTAIN ASSOCIATION SCHEMES 


Connor and Clatworthy (1954) have given the following results on the impos- 
sibility of an association scheme with two associate classes. Defining 


А = }*#--28--1 
у = 215—215 
Ё = Pisd-Ple (81) 


УА = (v—1)1—9)—2n, 
22, = (v—1) +29 
* 2 (v—1)—27 
they state the following theorems. 


Theorem A : If in а PBIBD with two associate classes v is odd and A is not 
an integral square, then it is necessary that 


4 v 
(i) Pk =P = Ph Ь 


(ü) = т = 0 = оз = 0, 
ата (Ш v=A=4t+1, 

Theorem B: If in a PBIBD with two associate classes v is odd and A is an 
integral square, then n must be an integer. 

Theorem C: Zf in a PBIBD with two associate classes v is even, then it is 
necessary that 

(i) A must be an integral square, 
and (ii) 25 must be an odd (positive or negative) integer. 


Consider a two classes association scheme given by one of the following where 
4t is not an integral square. 


v = 4—1, т = 21 ру ру = 1—1 ... (3.2) 
v= 4—1, Ny = 9t, - тти cae (3.3) 
от on 1, мет ... (8-4) 
pup сан оерт phy = ph =t 2s (8.5) 


In all these cases it is easy to verify that A = 4t + an integral square. Hence 
from Theorem A all these association schemes are impossible. Thus a Hadamard 
type symmetric BIBD with parameters (4t—1, 21—1, 1—1) or (4t—1, 21, t) cannot be 
obtained by our method if 4¢- an integral square. As is well known such designs 
do exist for a large number of values of ¢ and it is actually conjectured that they 
exist for all t. Bose and Shrikhande (1959) give a complete bibliography of the exis- 
tence problem of such designs. 
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We now give an example of an association scheme which satisfies all the 
conditions of Connor and Clatworthy and yet is impossible. Consider а two classes 
association scheme with 


v = 4t+1, Ny = ng = 2t 


t-l t nd 
P,= ‚ Pj= ... (3.6) 
t t Е 


Tt is easy to verify that we get А = 414-1, and = 0. If 4t4-1 Æ an integral square, 
then all the conditions of Theorem A are satisfied, and if 414-1 = an integral square, 
then again all the conditions of Theorem B are satisfied. 


Now define a square matrix M = (mj) of order 4t+-1 by 
S mg = 0, 
т; = Lif treatments i and j are first associates, 
= —1 otherwise. Ж, (3:7) 
Then utilising (3.6) it is easy to verify that 
MM’ = (4--3)],,4— Gy, КОО (8.8) 
where М” is the transpose of M and Z, and б, respectively denote the identity matrix 


of order п and the square matrix of order n with all elements equal to 1. The matrix 
ММ’ is obviously singular. 


Define the square matrix N of order 4t--2 by 


0 J 
N= ... (8.9) 
4 J' M, 
Where J is a row vector with 4t-- 1 components all equal to 1. 
Then ММ’ = (й--1) ... (3.10) 


Utilising the Minkowski-Hasse Theorem (Hasse, 1923) as given by Bruck and Ryser 
(1949) we obtain 


e NN) = (—1, 4+1), (и) 


where р is an odd prime. If p is a factor of the Square free part of 414-1 and is con- 
gruent to 3(mod 4), then 


СММ") = —1, ... (3.12) 
Since NN' is rationally congruent to Ius 
Collita) = ¢,(NN’) = 1. ... (3.13) 


Hence we get a contradiction. We may, therefore, state 


Theorem 3: 4 two 


associate classes association scheme given by (3.6) is non- 
existent if the square free 


part of 41--1 contains a prime congruent to 3 (mod 4). 
Thus for t = 5, 8, 14 the association scheme (3.6) is impossible. 
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In the proof of the above theorem we have already established that the exis- 
tence of the association scheme given by (3.6) implies the existence of a square matrix 
N of order 4t+2 with components 0, 1, —1 such that NN’ = (414-1)... We now 
prove the converse result. Given such a matrix №, it is obvious that 0 occurs once 
and only once in every row and column and hence by suitable interchanges of columns 
we can bring zeroes along the main diagonal. Again interchanging 1 and —1 in all 
the columns except the first, if necessary, we can make all the elements of the first 
row 1 excepting for the single 0 in the initial position. Similarly we can change all 
the elements of the first column excepting the single 0 into 1. Hence without loss of 


generality we can write 
DLE. 
М = 
J' M 


where J and J’ are row and column vectors with all components 1. It is easy to see 
that 

^ MM = (“+a Gua 

and that each row and column of M contains 1 and —1 exactly 2¢ times. Since the ' 
scalar product of any two rows of M is —1, and 0 occurs only along the main diagonal, 
it is easy to verify that М is a symmetric matrix. For suppose without loss of gene- 
rality that the first two rows of М are given by 


2—1 times 21 times 
Е 1 И qs] EET c 
EX EE S| PECTORE sass lj Sedes ые) 
«times 2f—-l—atimes 2t—a times æ times 


Then their scalar product is 4a—4¢+-1 which can never be equal to —1. Hence the 
element in position (2, 1) must be the same as the element in position (1, 2), and hence 
the matrix М must be symmetrical. Again it can be easily verified that if the element 
(i, j) i Æ j is 1, then the rows $ and j contain the ordered pairs (1, 1) (1, —1), (—1, 1) 
and (—1, —1) exactly 1—1, ї, t, t times respectively. Similarly if the element ($, j) 
==—1, then these pairs occur t, t, 7,1—1 times respectively. Identifying the rows 
and columns of M with a set of 4t+1 treatments and taking 1(—1) in position (i, j) 
to imply that treatment j is first (second) associate of treatment i, we obviously have 
a two classes association scheme with parameters given by (3.6). 


In conclusion it may be mentioned that the matrix N has been used by 
Raghavarao (1960) in certain weighing designs, and the nonexistence of N has been 
proved by him under identical conditions. 
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INCOMPLETE BLOCK DESIGNS 
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SUMMARY, A general two-parameter family of Balanced Incomplete Block (BIB) designs is 
derived and a method of constructing them is given. Connection of а sub-family of these designs with 
Hadamard matrices and certain optimum minimum distance codes is also established. 


1. INTRODUCTION 


In a recent paper P. K. Menon (1962) has given a one-parameter family of 
difference sets for symmetric BIB designs. In this paper we generalise his results to 
derive a two parameter family of BIB designs such that any two members of the family 
can be used to obtain a new member of the same family. My thanks are due to Menon 
for having shown me a copy of his manuscript which led to the present results. 


2. NOTATION AND PRELIMINARY RESULTS 
A BIB design with parameters v, b, r, k, А is an arrangement of v objects or 
treatments in 0 sets or blocks such that (i) every block contains Ё < v different treat- 
ments, and (ii) every pair of distinct treatments occurs in exactly A blocks. Then it 
is easy to see that every treatment occurs in exactly r blocks and the parameters 
satisfy the relations 
А(и—1) = 060—1), bk = vr, b > v. И 


The last inequality is due to Fisher (1940). A BIB design is called symmetrical if 
b = v and hence r= k. 
Let № = (nj) be the incidence matrix of order (v, b) of а BIB design with 
above parameters, i.e., 
^, = lif treatment i occurs in block j, 217 (2.2) 
= 0 otherwise. 
Then it is obvious that every row and column of № contains 1 in гапа k places respec- 
tively and that any two rows of N contain the pair (1, 1) in exactly A columns. Con- 
versely any (0, 1) matrix having these properties can be regarded as the incidence 
matrix of a BIB design. 
If N is the incidence matrix of the above BIB design, then by interchanging 
0 and 1, we get Е 
N*—J—N ... (2.3) 
where J is the matrix of order (v, b) with all elements 1. It is easy to verify that 
N* is the incidence matrix of the complementary BIB design with parameters 
v, b, b—r, v—k, b—2r4-A. 
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If A = (а) and В = (b;) are two matrices of order (m, n) and (p, q) respec- 
tively, then the Kronecker produet of А and B, denoted by АХ B is defined by 
0O—AXB—(,B), i—1,2,..,m j—1,2,..,n. E (2:4) 
The matrix € is of order (mp, ng). Kronecker product of matrices has been used by 


Vartak (1955) in the construction of Partially Balanced Incomplete Block designs. 


A Hadamard matrix Я, is a (1, —1) square matrix of order n such that 
H,H', = n1, where І, is the identity matrix of order n. It is known that H, can 
exist only when 7 is either 2 or a multiple of 4. А complete summary of the status 
of the existence problem is given by Bose and Shrikhande (1959) and it is conjec- 
tured that a Hadamard matrix of every possible order exists. 


3. Сомрозїттох or BIB DESIGNS 
We give here the generalisation of the method given by Menon (1962). 


Theorem 1: Jf D; is а BIB design with parameters vi, bi, r;, ki, A; belonging 
to а family (A) with 
b; = А), ... (3.1) 
and №, and N}; are the incidence matrices оў), and D}, where Di is the complementary 
of D, 4 = 1, 2, then 


N = NiX N:+Ni XN; ... (3.2) 
is the incidence matrix of a BI B design D with parameters v, b, v, k, A given by 


V = 0105, b= bj by 

pm ry rg (5374) (05—73) 
k = bees (v; — e) (0—0) 
À = r—bj4 


(3.3) 


с 


and the design D also belongs to the same family (А). 


Proof: For any (0, 1) matrix A of order (m, п) define А to be the corres- 
ponding (—1, 1) matrix obtained by changing 0 into —1. "Then obviously 


А —24—J,, 2. (34) 
Hence Мух М» = (2N1— Ли, 6, ) «(2N3— Jos, ba ) 
m М, x Na (Jos, bi —N3)x (Ju, ba —Х»)]—.Лв, bib. 
= 2N — Jos, bibs: ... (3.5) 


И we call №, i = 1, 2 the (—1, 1) incidence matrix of D, then it has been 
shown in by Sillitto (1957) that №, х №, is the (—1, 1) incidence matrix of D if the 
conditions of Theorem 1 are satisfied. Hence (3.4), (3.5) imply that N is the incidence 
matrix of the design D. Obviously D belongs to the same family (А), 
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- 4. PARAMETERS OF THE FAMILY (А) or BIB DESIGNS 
In this section we derive the parameters of the family (4) in a form which is 
equivalent to that given by Sillitto (1957) but which is more directly useful to us, 


Let D belonging to the family (А) have parameters 0, б, 7, Г, A. In virtue of 
(= 4(r—A) and (2.1) we have 


В t(r—A)(r—20)%, v = ar ay 


Since v is an integer 2A/(r—2A) must be an integer which is either even or odd.” 
If 2A/(r—2A) is even = 2m, where т is an integer, then r = A(14-m)[m. 
Since r is an integer we have A = mt, where tis an integer. Then from (4.1) and (2.1) 
we get 
v = [2(14-)]°, k = (1+m)(1+2m), т = t(14-2m), b = 4(1--m)t, A = mt. 


Putting 2(1--т) = s, we get the sub-family of (A) with parameters 
vos, b= 261, r= (ава ПА с ey. 


where s and ¢ are integers, s even and 2(s—1)t > s(s—1). Since b is positive s and 

t in (4.2) are either both positive or both negative. By simultaneously changing 

the signs of s and t in (4.2) we get 

v = 82, b = 2st, r = (s--1)t, k = s(5--1)/2, А = (s--2)t]2 ... (4.3) 

where 2(s--1) > (8-1). The parameters (4.2) and (4.3) are parameters of comple- 

mentary designs and given any one of them the other can always be constructed. 

Hence without loss of generality we may define the sub-family (4,) of (4) by the 

integers s and £ where the parameters of the design A,(s, t) are - 


ор b= бәй P(e в — (8—1)/9; е. о 


and s and # are positive integers, s even and 2t > 8. 
Similarly if 2A/(r—2A) is odd = 2m--1 then r = 4A(14-m)/(14-2m). Since 
r is an integer, this implies that A = (1+2m)t. Proceeding as before and putting 
2m-+3 = s, where s is odd, we get the series with 
в = 81, b= 4st, r= 2(s—1)t, k= &(83—1)/2, XA-(s—2)0 ... (4,5) 
with 4(8—1)! > s(s—1). Аз before simultaneously changing the sign of s and t, we 
' get the complementary design with 
v= 88, b-4s, r= 984-1), k= а(в--1)9, A-—(s-2) ... (4.6) 
with 4(s+1)t > s(8--1). Hence we can define the sub-series (A,) of (A) given by the 
integers s and ¢ where the parameters of A,(s, t) are given by 
v—s? b= 4st, r= 2As—l1)t, k= s(s—1)/2, A= (s—2), s (4.7) 
where s and ¢ are positive integers, s odd and 44 > s. Replacing s by 2s and г by s 
in (4.4) we get 
. u= b = 48%, r= k = 828—1), А = s(s—1) С) 
which is the series given by Menon (1962). 
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5. COMPOSITION OF MEMBERS OF THE FAMILY (А) 


Let A,(5, t) and A,(8, ta) be two members of (A,) given by parameters (4.4). 
Using the composition theorem it is easily seen that we get a BIB design whose com- 
plementary design is the design <A,(s, 3», 24 tə). The particular case of this result 
when s, = 24, and s, = 2%, is given by Memon (1962) when the original designs are 
obtained by the method of difference sets. 

In a similar manner it can be shown that the existence of A,(s,, t1) and А„((з„, ta) 
implies the existence of 4,(s,5,, 4t,/,) and the existence of A(s}, t) and A,(ss, ta) implies 
the existence of A(s, 8», 4t, ta). 

Theorems 5 and 6 of Menon (1962) imply the existence of A,(2” 6", 2-1 3") 
for non-negative integers т and n such that m+n > 1. The design with para- 
meters (16, 24, 9, 6, 3) due to Bhattacharya (1944) is the design 4,(4, 3) and the design 
corresponding to the finite Euclidean plane EG(2, 3) is the design A,(3, 1). All these 
designs can be used to generate infinitely many members of the family (А). Further 
from the designs A,(4, 3) and A,(4, 2) by suitable repetitions we get A,(4, 1) for all 
t> 2. An independent and nonisomorphie solution of A,(4, 5) is given by Rao (1961). 


6. А FAMILY OF SYMMETRIC BIB резах8 

If a symmetric BIB design with parameters (v, r, A) exists, then by the process 
of block section (Bose, 1939) we get a BIB design with parameters (0—7, v—1,r, r—A, 
A). Tf this belongs to the family (A), then 0—1 = 4(r—A). Combined with A(v—1) 
= r(r— 1), this shows that r is a perfect square and that r = (r—22)2. Putting r = 87, 
we have A = s(s+-1)/2, and hence we get two series with | 
3 v = 82-- (8-1), r= 8, А = 8(8—1)/2 06:1) 
апа 9—(s—1y-rs&, r= 8, А = s(6+1)/2. о (8.2) 
By putting s--1 for s in (6.2) and taking the complementary design, we get a design 
with parameters (6.1). 


From (6.1) and (6.2) using the methods of block section and block inter- 
section (Bose, 1939) we get the following designs. 


v = (s+1)?, b = 28(8--1), r =s, Б = 8(84-1)/2, А = 8(8—1)/2 ... (6.3) 
premat b = 2s(s--1), r = 8—1, k = s(s—1)/2, A= ($2-1)(s—2)/2 ... (6.4) 
v = (8—1), b = 260—1), r= s,  k—s(G—1)/2,A—s(L1)/2 .  .. (6.5) 
0 = 82, b = 28(8—1), r = 8—1, k = 8(8--1)/2, A= (8—-2)(8—1)/2. ... (6.6) 


АП these four designs belong to the family (A). For example with s even 
(6.4) is A,(s,s+1), and (6.3) is A,(s+-1, 8/2) and the complementaries of (6.6) and 
(6.5) are A,(s, 8—1) and A,(s—1, 8/2). АП these designs can, therefore, be constructed 
for any s if (6.1) and (6.2) exist for the corresponding values of s. For s — 2 and 3, 
(6.1) gives the symmetric designs (13, 4, Папа (25, 9, 3) which are known to exist. It 
is easy to verify thatif pisanyodd primefactor of r—A in (6.1) or(6.2), then A is either 
0 or 1 (mod р). Hence the theorems of Chowla and Ryser (1950) on the impossibility 


36 


ON A TWO PARAMETER FAMILY OF BIBD 


of symmetric BIB designs do not yield any positive result. For s = 4, 5, 6, 7, 8, 9, 
10 and 11, it can be proved that the corresponding residue difference sets (mod v) 
do not exist by using the results of Hall and Ryser (1951) and Hall (1956). It may be 
noted that the solutions to the designs (6.3)—(6.6) may exist even when the designs 
(6.1) and (6.2) are impossible. 


7. CONNECTION WITH HADAMARD MATRICES 
Let N denote the incidence matrix of the design (4.8) and let N denote the 
corresponding matrix by changing 0 into —1. Any two rows of N contain the ordered 
pairs (1, 1), (1, 0), (0, 1), (0, 0) exactly 8—8, s?, s? and 82-5 times respectively. In 
the corresponding two rows of N these pairs go into (1, 1) (1, —1), (—1, 1) and (—1, —1). 
Hence the scalar product of any two rows of N is obviously zero. Thus N is a Hada- 
mard matrix of order 482. Hence we have the following. 


Theorem 2: -The existence of а symmetric BIB design with parameters 
(482, 282—8, s*—s) implies the existence of a Hadamard matrix Hass. 


This theorem can also be derived in the following manner which incidentally 
provides a partial converse to the above theorem. Defining the distance between any 
two.rows of a (0, 1) matrix as the number of places in which they differ it is easy to 


verify that 
N 
ied 
N* 


represents a set of 88? rows with 4s? components in 0 and 1 such that the distance 
between any two rows is > 28%. Applying Theorem 1 in Bose and Shrikhande (1959) 
we get the desired result. To actually construct the Hadamard matrix we proceed 
as follows. Replace every 0 by 1 in the first row of A, and in the corresponding columns 
of A interchange 0 and 1 to obtain the matrix B. Each column of A and hence of 
B contains each element 0 and 1 exactly 4s? times. Let B, be the square submatrix 
of order 4s? in В such that the first column of B, contains only the element 1. Re- 
placing 0 by —1 in B, it is easy to verify as given by Bose and Shrikhande (1959) 
that we get a Hadamard matrix H4s:. 


Now suppose H4s exists. Without loss of generality we can take the first row 
and first column to consist entirely of unity. Then suppressing this row and column 
and changing —1 into 0 we get the incidence matrix N of a symmetric BIB design 
with parameters (482—1, 282—1, s*—1). Suppose this design satisfies the condition 
(0,) that it is possible to find a set of р = 2s?—s—1 columns and rows of N, such 
that in these p columns each one of the p rows has exactly 8®—в—1 positions 1 and 
each of the remaining q = 282--8 rows has exactly s*—1 positions occupied by 1. 
Without loss of generality, we can then put 


м, М, 
N= 
№, N, 
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when N, and №, are square matrices of order p and q respectively. Further, each 
row of N,, Na, М, and N, contains 1 in exactly 82—8—1, 3—1, 52-3 and s? positions 


respectively. Now consider 
№ № 
М, = 
№ № 


Tt is easy to see that the distance д between any two of the first p rows or any two of 
the last q rows of M, is 28°, whereas the distance between two rows one from eaeh of 
the two sets of p and q rows is 25?—1. Since the distance between any two rows 
remains constant by interchanging 0 and 1 in any given columns, we see that the same 


result is true for 
№ № 
М, = 
№ М 


Obviously every row of (Nj, №3) contains 1 in (s*—s—1)-+-s? = 2s*—s—1 positions 
Similarly every row of (№ N,) contains 1 in (s?—s)-+-s® = 282—8 places. Since the 
distance à between two rows is 71-7 — 2А where r, and r, are the number of 1’s. in the 
rows respectively and А is the number of positions occupied by (1, 1), it is obvious 
that for the incidence matrix М», each of the first p treatments occurs 2s2—s—1 


R25. ту ола 
times and any two of the p treatments occur together xs 5 JOE 8%—в8—1 


times. Similarly each of the q treatments occurs 25*—5 times and any two of the q 
treatments occur together s?—5 times, Further any one of the р treatments occur 
with any one of the q treatments exactly s*—s times. Noting that № contains 
1 in 8—8 positions and regarding the rows and columns to correspond with blocks 
and treatments respectively, the matrix : 


1 Jip 0, 
M, zz Ja N, № 
On №; N, 


сап be regarded as giving an arrangement of v — 4s? treatments into v blocks 

each containing k = 2s?—s treatments such that any two blocks have exactly 

Das k(k—1) 
v—1 


= 3#—з treatments in common. From Chowla and Ryser (1950) it follows 


that М, represents the incidence matrix of a symmetric BIB design with parameters 
(4.8). Changing з into —8, i.e., assuming the condition (C...) we similarly get а 
symmetric BIB design with parameters (48°, 2824-8, 321-53) and hence the comple- 
mentary design given by parameters (4.8) Hence we have the following theorem. 

Theorem 3: If there exists a symmetric BIB design with parameters (452—1, 
252—1, 52—1) such that 

(i) there exists a set of 2525-1 blocks containing each of a set of 252 —8—1 
treatments 8®—в—1 times and each of the remaining 282-5 treatments s$—1 times, or 
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(ii) there exists а set of 25?--5—1 blocks containing each of a set of 252--5—1 
treatments s*-Es—1 times and each of the remaining 2s*—s treatments s?—1 times, 
then we can construct a symmetric BIB design with parameters (482, 982—8, 8%—в). 


Let е be an odd integer such that e = p", e--2 = q” where p and 4 are odd 
primes. Let z and y denote primitive elements of GF(p") and GF(g") respectively. 
Consider the set of elements (x, ) where ge GF(p") and ре GF(q") the addition and 
multiplication being defined by 


(о, 21) (0, Be) = (о-о, itha) 
(ол, Вне», Bo) = (од о, №, Ba). 
Putting 2= (0,0), шю = (8, 0), 0= (0, 0) 


we have the following theorem (Stanton and Sprott, 1958). 
Theorem : (Stanton and Sprott). The elements 


e 
(а eee yO, Иб pipa ито) 


Jorm a difference set for a symmetric BIB design with parameters (482—1, 952—1, 8—1) 
where 98 = e--1. 


Since the first component of the elements 0, м, ..., w/-? ranges over GF(e), 
it is obvious that all the elements of the Galois Domain GD e(e--2) are obtained by 
letting the second components in above elements range over GF(e+2). Let Co, By, 
B, ..., B, , denote the set of e+2 = 2s--1 elements each obtained from 0, w, ил, 
w’ respectively in this manner. The blocks and treatments of the design can then 
be divided into above subsets. Regarding the difference set as assigning the treat- 
ment 0 = (o, о) to the blocks corresponding to the elements in the difference set, we 
can obtain the design by developing the initial row, corresponding to the treatment 
0 by first developing according tothe second component and then according to the first 

3 


у 
component. It is easy to see that the set of elements г, i= 0, 1, €: form a 


2—1 
multiplication abelian group of order ex and that the elements of the difference 


set other than 0 contain exactly e = 8+1 elements with the first component 
different from о. Since the treatment 0 occurs only in the block 0 = (о, о) of Op 
it is obvious that every treatment of C, occurs exactly once in the blocks of Og. The 
treatment w° occurs in those blocks obtained by adding x! to the first component of 
each of the elements of the difference set. Since the difference set ‘contains exactly 
8-1 elements with the first component —a', we see that the treatment w’ occurs in 
8-+1 blocks of C, and hence each element of В; also occurs in blocks of C, exactly 
+1 times. Obviously 0 = (о, о) is occurring in s--l blocks of each B; and hence 
each element of C, occurs in s+1 blocks of each В;. Similarly each element of Bj, 
j 52 4 will occur in blocks В; exactly s+1 times, whereas each treatment of B; will \ 
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oceur in these blocks exactly once. The number of times each treatment occurs in 
the blocks is, therefore, represented by the scheme 


€, B, B, E : . Вз, s 
с, JS e EI EI Е $ : 8+1 
By 8+1 1 8+1 . . с sti 
B, 84-1 84-1 1 А 2 84-1 
| 
Вы sie 1 8-1 84-1 ў 5 А av] 


where the rows and columns corresponds to sets of treatment and blocks respectively 
and each diagonal element is 1 whereas each nondiagonal element ів s--1. Since 
each set of treatments and blocks represents a set of 2s-+-1 entities, all that one has 
to do is to select a set of s—1 columns of the above matrix. The blocks thus obtained 
satisfy obviously condition (i) of Theorem 3. Hence we have the following : 


Corollary : If e and e+:2 are both odd prime powers, then the design with para- 
б 1 
melers (4.8) exists with s — de . 

Taking e = 9, 11, 17, 23, 25, 27; 29, 41, 47, 59, 71, 79 we obtain that the design 
with parameters (4.8) exists for s — 5, 6, 9, 12, 13, 14, 15, 22, 24, 30, 36, 40. The case 
8 = 5 corresponds to the design with parameters (100, 45, 20), which is mentioned 
by Hall (1956) as an unsolved case in difference sets. The results for s — 6, 12, 24 


already follows from the results of Menon ( 1962). Theorem 1 сап now be used to pro- 
vide solutions for other values of s. For s < 30, it is then possible to obtain solutions 
for all values excepting s — 7, 11, 17, 22, 23, 25, 27 and 29. 
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DECOMPOSITION BY BILATERAL COSETS 
AND ITS GENERALIZATION 


By MOTOSABURO MASUYAMA 


Tokyo University 
and 
Indian Statistical Institute, Calcutta 


SUMMARY. In previous papers the author (Masuyama, 1961a, 1961b, 1961c and 19614) intro- 
duced (1) the factorial decomposition, (2) the hierarchic decomposition, (3) the symmetric decomposition 
and (4) the periodic decomposition of a finite module or a finite ring with unity. Each decomposition sup- 
plies a family of Partially Balanced Incomplete Block designs, if we make one element of a module or of 
a ring correspond to one variety and vice versa. As is proved by the author (Masuyama, 1961d) the 
periodic decomposition is a refinement of other decomposition. We shall generalize in Section 1 the 
concept of the periodic block in the sense that ordinary cosets are special cases of bilateral or double 
cosets in the theory of groups and shall generalize, further, from the viewpoint of mapping in Section 2 
which was noticed by Р. К. Menon.* - 


1. Suppose, J£ be a ring of order v with unity e. Let . and JV be two 
multiplicative groups of order g and h respectively contained in this ring, e being the 
unity of G and K. Gand X are not necessarily distinct. a, G; and H, being an 
element of Æ, Gand Æ respectively, G;a,H, is an element of Æ. We arrange this 
element in the row (u;) and in the column (G;, Hj), for i = 1, 2,...,0, j =1,2,..., 9, 
1=1,2,...,h. The order of arranging the heading (a;) or (G;, H;) is immaterial, so 
far as all possible cases occur just once. 


TABLE 1 
(91. Н) = (ee) ... (Gj, Hi) ... (Gg, Hn) 
(a1) = (е) eee e GjeHp ... ен 
(aq) вав ew GyHQ e Gga;Hn 
(ay) „бан ... GgasHy 


*Dr. Р. К. Menon, Director, Cypher Bureau, Ministry of Defence, Government of India, drew the 
author's attention to В. Vaidyanathaswamy's paper ‘А remarkable property of the integers mod N, and 
its bearing on group theory,’ published in Proc. Ind. Acad. Sci., 5(1937), 63-75, in which Vaidyanathaswamy 
treated a special case of our periodic block, which corresponds to Fuchs’ case with different notations. See 
L. Fuchs ‘Ueber die Perioden usw’, published in Crelle's Journal, 61(1863), 374-386 and P. Bachmann’s 
Lehre von der Kreistheilung, Leipzig, 1872. Vaidyanathaswamy’s method of determining coefficients 
which appear in a product of two periodic blocks, in our terminology, is new. However, these coefficients 
are easily determined by the remark given by Masuyama (1961d), at the end of Section 4 not only in the 
Vuchsian case but also in any case of periodic blocks generated by elements of a finite ring. 
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Then each column contains all elements of the ring exactly once, because if 
we have 


Gal, = GH, Fa) 
then multiplying G7! from the left and H7! from the right we get 
Н a=b. о) 


Identical elements of the ring may appear in the same row more than once. 
Suppose that there are exactly d elements contained in the row (а) which are equal 


to a, ie. 


а = @заН»ь = @заНь = ++, = блан. АЗ) 
(i) If d = gh, we have a= GjaH, ... (4) 
fors j 1, 2, ...,0; ЧЁ 1,-2, ..., h. 


(ii) If d <gh, there are elements which are not equal to а. Let any one of them 
be GaH. Then multiplying G from the left and H from the right we have by (3) 


GaH = GG,aH,H =... = GG,,aH,H, и (5) 


with (G, Н) A (CCm, Н,„Н) for т = 2, 3,...,d. Thus бан reduced to an element 
of A appears at least d times on the same row (a). If there were more than d elements 
which are equal to GaH, let Gy&H, be any one of them. 


Then we would have 


а = б„аНь = + = GaHy = GAG aH H> + = (6) 


with (Gim, Him) = (09, HoH) for т = 2, 3, ..., d, which is contrary to our assump- 
tion. Thus each distinct element of _/ is contained exactly d times in the row (a), 
if it is contained in the row (a).* 


А block which contains all elements of one row of Table 1 is called a bilateral 
coset block or in short BC block and a block which contains every different element 
in the same row exactly once is called a normalized BC block. Then two BC blocks 
obtained from different rows are either identical or disjoint. 


If бан, = G,bH,, for а b, eae (7) 
we have GaH = GGz1G,bH,Hr'H, for any GeG and Hen. ... (8) 


Thus all elements GaH are contained in the row (b) Similarly, all elements GbH 
are contained in the row (а). "Therefore, the sum of all possible non-identical norma- 
lized BC blocks contains all elements of the ring once and only once. 


* The author wishes to express his thanks to Dr. P. K. Menon for kindly pointing out the mistake 
in the proof on this point in the first manuscript. 
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We shall now prove that a product of any two BC blocks is represented by a 


linear combination of BC blocks, coefficients being non-negative integers. In fact 
we have à 


баН-Е@„ЬН„ = Gja--G7G,bH,Hz 1)H, 
= Q(a+0,bH,)Hn ... (9) 


in which the second term in the bracket runs through every element of the row (b) 


once and only once, for all combinations of m and n, whatever and / may be so far as 
these two suffixes are fixed. 


са = 4+0,bH, 0). 
being an element of the ring A, the block 
(90,8, s Ge Ho +-+, Gi Hi) S04) 


for fixed values of p and q is a BC block. q.e.d. 


The periodic block is a special'case of our bilateral coset block in which one of 
. and K consists of only one element e. 

2. "Theabove result is easily generalized, if we realize that the essential features 
of the above proof are (i) the multiplicative group property of the transformations or 
mappings ту of an element of a, i.e. G,aH, in this case, and (ii) the distributive property: 
of the transformation or the isomorphism between а and туа. The first property is 
used in getting the formulas, (2), (5), (6), and (8) and the second one is used in getting 
the formula (9). The existence of the unity in Æ, which we have assumed in Section 
1, is needed only when we utilize the group property of the mappings. 

Now let 4 be a finite module of order v and т; а be a mapping of a in 7 into 
A. Then all mappings т; constitute a semi-group (70), of which an inversible element 
gives a bijective mapping. All the bijective mappings constitute a symmetric group, 
i.e. a substitution group €(.4t), of which automorphic mappings constitute a subgroup 
ACH) of SM). Any subgroup of M(t) сап be used for generating a Partially 
Balanced Incomplete Block design. ‘Table 1 in Section 1 is replaced by the following 
table : : 


TABLE 2 
ті T Tj ^ Tg 
(a1) | TiB1 T Tja Jm таб 
Ga EA am эщ ae "s 
(as) s T T Vis Dd, 


The set of all elements in а row constitute а block, in which all distinct elements 
appear with the same frequency, say d-times. А block which is derived from one row 
and contains all distinct elements exactly once, is called a normalized block. The block 
thus obtained may be qualified by the specific mapping used. 
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3. We shall illustrate our method by one of the commonest transformations, 
i.e. by conjugation. Suppose that а is an element of a ring J£ of order v with unity 
апа G; is an element of a multiplicative group of order g contained in Jf. Then 
the conjugated block . 


{Өа@т!\, ..., GjaG;7, ..., Had} Ше (12) 


ог Из sum generates а Partially Balanced Incomplete Block design by multiplying 
{s}, s being every element of J£. The mapping ‘conjugation’ satisfies two conditions 
stated in Section 2, 


Example: Consider the matrix ring of order 16 (see Appendix) which is 
quoted by (Masuyama, 1961d). Let @ be (12, 23, 31, 32, 21, 13), of which (12, 23, 31), 
(12, 13), (12, 21) and (12) are its subgroups with regard to multiplication. "The norma- 
lized conjugated blocks obtained by . are as follows : 


Е = (00), 

A = (12, 

В = (23, 31), 

0 = (32, 21, 13}, 
D — (01, 20, 33], 


F — (02, 03, 10, 11, 30, 22). 


All these blocks are self-conjugate. The multiplication table of these blocks are given 
by Table 3. 


TABLE 3 > 
NE s DESI Т то, ООЛ ш о оу 
Е E : - 
А А Е 
B B B 3E--2A 
a с р Е 3E--2C 
D D с F 34+2D 3E--2C 
F | F F 2049р 3B--2F 3B42F 6£+64+40+4D 


There are simple linear relations among these blocks and the periodic blocks 
(Masuyama, 19614), i.e. 


Е, = A+B 
Р.БР, = D+F 
апа Е,= О. 
Ву setting D,=A+C+F and Рр, = B4-D 
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we obtain the following multiplication table : 


TABLE 4 
Е Е n ique ee Die D 
Е | E 
Dic D, 10H+6(Di+Ds) _ 
Ds | D; 3р, --4Ds 5E--2Di 


The initial block D, and the initial block (Z-++-D,) supply two Balanced Incomplete 
Block designs which are complementary. The initial block D, supplies a Partially 
Balanced Incomplete Block design with the following parameters of the first kind : 


УВ 16, k—r—05, n, — 10, n, = 5, А, = 2 and A, = 0. 


The parameters of the second kind are given in Table 4. А Partially Balanced Incom- 
plete Block design with parameters of the same first and second kinds is given by 
Clatworthy (1956). However, his design is not cyclic. 


Appendix: Matrix ring of order 16 
THE ADDITION TABLE 


00| 00 01 02 03 TO SUIT РЫА: E А7 20 21 22 23 30 31.32 33 
01| 01 00 

02| 02 03 00 

03| 03 02 01 00 

10717 л0 912 АӨ 00 

11 11 10 +18. 12 01 00 

12 19,08 Д0 11 02 03 ^ 00 

18 19— 1257119410 03 02 01 00 

20| 20 21 22 23 30 31 32 233 00 

21 21 20 23 22- 31 30 33 32 01 00 

22| 22 93 20 21 32 33 30 31 02 03 00 

23| 23 22 21 20 33 32 31 30 оз 02 01 00 

30| 30 31 32 33 20 21. 22 23 10 11 12 18 00 

31|.31 30 233 32 21 230 23 22 11:5410:- 119.— 712 01 00 

32| 32 33 30 3 РОО 12:038 +'40 11 02 03 00 
33 | 33 32 31 30 23 22 21 20 Е варо п са) 03 02 01 00 
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THE MULTIPLICATION TABLE 


N 00 01 02 03 107 11-12. в 20-91-92... 23 ПОЛО 32 зз 

R 

00| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
01| 00 00 00 00 ol or о о 02 02 02 02 03 03 03 оз 
02| 00 01 02 03 00 от о? оз 00 ої 02. 03 00 01 02 оз 
03 | 00 01 02 03 01 00 оз 02 02 03 00 01 03 02 O1 00 
10| 00 00 00 00 10 920—390 ДО 20 20 20 20 30 30 30 30 
11| 00 00 00 00 Ту, 531.31 JAT /29;-189 99 2356535:39 ..33 
12| 00 01 02 03 10; b :—12 718 20:770] 4?!29 = 98 30 31 32 33 
13| 00 01 02 03 1175410. 38-532 35.598 2-20" 521 85257391081 30 
20| 00 10 20 30 00 10 20 30 00 10 20 30 00 10 20 30 
21) 00 10 20 30 QI cn uy. 287 Па а 99 2-89 03 13 23 33 
22| 00 11 22 33 00: 11/2*22- 98 00 11 22 33 0011 22 33 
23| 00 11 .22 33 015.10. —98. 32 02. 13 20 31 03 12 21 30 
30| 00 10 20 30 10 00 30 20 20 30 00 10 30 20 10 оо 
31| 00 10 20 30 ИК БАНУ КӨП ду 32 32 02 12 33 28 19 оз 
32| 00 11 22 33 10-01 32-23 20 31 02 13 30 21 12 оз 
33| 00. 11 22 33 11. 00 33 22 22 33 00 п 33 22 п 00 , 

eu сүз 


($, j) stands for ( (УЫ, ) with $ = с11-- 12 and j = с; + 2022, and ck is an element of the module to 
21 023 


modulo 2, i. e, 0 or 1. 
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By G. S. JAMES and ALAN J. MAYNE 
University of Leeds 


SUMMARY. А combinatorial method is described of obtaining the joint cumulants of ап arbi- 
trary number of functions of an arbitrary number of variables, the cumulants of the variables being known. 
The important case when all r-th order cumulants of the independent variables are of the (r—1)-th order 
of small quantities is treated in detail, and explicit results are given as far as the fourth order of small quan- 
tities (at least). Proofs are only outlined, since they are implicit in previous work. 


Let 2 be a random variable all of whose cumulants к, exist. Let у be a func- 
tion of ж expansible in a Taylor series, 


y = Apt Aye Ager}, ий) 
Then, under certain circumstances, and in particular when y is a polynomial in v, 
all the cumulants K, of у will exist. When y is a polynomial, they may be evaluated, 
via the moments, using (1). For example, if А, = A, =... = 0 we have 


Ky = As AG 3 As K1), 
K, = Ак, ЗА, A (к кок) AX(K4 3-43 4-2k8-- 4 KT), 


and во on. When у is not a polynomial the cumulants К, may not exist,* but even if 
they do their expressions will be complicated infinite series involving all the A, and k,. 
То get usable results we assume that each к, is of order у7'+1 in some large number 
у, that the A, are independent of v (or at least of order у), and in the expressions for 
the K, we retain terms only up to a given order in v. 


In a previous article (James, 1955) one of us outlined a combinatorial method 
which can be used to carry out the calculations, and used it to prove the non-trivial 
result that K, is also of order v-"*!, The proof depended, in what may seem rather 
a weird way, on a result published three years later (James, 1958; referred to in the 
previous paper as “James, 1955, not yet published"). Here we explain the method 
in more detail, sketch a complete proof, and provide explicit results as far as terms of 
order у^ (together with the terms of order v-? in K,). 


1% transpires that it is simpler to generalize (1), by considering q variables 

y^; ..., 9% expressed in terms of р variables 21, ..., «?, by series of the type 
y' = A'A HAG айт)... (А5 = Af, eto.). n (2) 
The superfixes are indices, not exponents, and we use throughout the convention that 


xcd with repeated indices, such as 7,j,... above, are to be summed over i, j, ... 
= ‚ p. We shall use a slightly modified form of the "tensor" notation suggested 


*Sufficient conditons for their гас аге | that visa Bounded: fandom variable) Nos bounds 
are numerically less than the radius of convergence of (1). It is not sufficient that y is an integral (entire) 
function of z, as is seen by taking v to be a standard normal deviate and y = exp 22. 
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by oru цә їп ы: the first, second, ... order multivariate cumulants of the 


x are кї лы ...,), and iios. of the y* are К“, K”, ... (а, b, ... 
ИЕ НЕ For em 

к= @:* 
апа kW = а —(& х*ї(& x) = cov (xf, a)) [= vara'ifi = jl... (3) 


It is assumed that all the r-th order cumulants are of order у". Results for the 
univariate case (1) ean be obtained by taking p = g = Тапа writing ку, кз, ... for 
кі, кі, ..., and similarly for the K's. 
We now write, temporarily, 20, 21, ... for the expressions 1, æt, ..., 
aka, ..., ж?к”, alatat, ..., and rewrite (2) in the form 
y = А® x, we (4) 
It follows immediately from the relations &y' = A" Gz, уу = A^ A" Ez: 
..., and from moment-cumulant relations analogous to (3), that 
К° = A® «,(z), K” = A" A" ky, (2), .... КО (5) 
Replacing the z's by their meanings as x-products, and introducing a notation in which, 
for example, «[ij, Е] denotes the second-order cumulant of v's’ and æt, we have, for 
example | е 


3 3 
Ке — At AME [i, j, НЕХ Ag А AG k [ij, k, T] 4-342, А} Ас, K [Uh 1, m] 


x”, алал, 


3 3 
FY ААА к lij, kl, т])--( Азы, ASK [ijkl, m, n] 
43 ААА к [ijk, Im, n] HAZAR Ann K lij, kl, mn])+.... per (6) 
In this expression we have written 
A% А) Аск [ij, k, U for (ДАДАТ ААА ААДА) к [ij k; D]; 
the sum over all (three) distinct terms obtained by permuting а, b, с. Note that 
&4 Ак [ij, k, 1] = ASALAT к [ij, К, 1]; 
thus the interchange of &uperfixes of А coefficients possessing equal numbers of suffixes 


produces no new terms. Hence, if a particular summation is of terms which have 
рі A's with тү suffixes, ..., py A's with rp suffixes (r, > т, >... > rj) then there are 


р\[д\!\...рд! (T) 
terms in the sum, where р = Ур. 


It remains to express the cumulants к[ ] of z-produets in terms of those of 
the ал. We proceed via the corresponding moments o[ ]. We have, for example, 


alij, k,l] = (а)ба = Exizicke! 


= кін 5 кї! 5 Ko B KIEL кіқјкёқі ; EROS) 
«fij, k, 1] = alij, k, Ио, kjafl]+alij, По[&]--а[Е, 1) app +2ta[ij] На ... (9) 
= üt Ў кійкі 4 Ў itt, ... (10) 
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We can summarize the terms in (8) by means of the bipartitions, or arrays, 


ij) 9 | 9 yj) at j 0| ij 

kk b | Ё b| k Ё | k 

Tie | По {лты 

Ilym -2|ük 1 elm j 1|ў HM 

I Па IIb Illa 
9| 9 РД $75 ij ij ие 3 
k k b| k k| k Ё k 
1 DW DEUS 1 1 
I DN P LA рта LI] W og ix. И 
IVa IVb IVc V 


in which row “totals” correspond to the left-hand side, column totals to the various 
terms:on the right, and the italicized numerals show the number of terms of each type 
(Fisher, 1930; Kaplan, 1952*). It should be noted that the numerical coefficients in 
(8) are all unity, and that there is one terni corresponding to every possible way of 
splitting the row totals. When we subtract the three terms in parenthesis in (9), 
this removes the terms in (8) corresponding to arrays IIa, IIIa, IVb, IVe, which are 
dissectable or unconnected, and fall into two blocks. ‘Terms corresponding to IVa and 
V, which fall into three blocks, are subtracted three times, but this is put right when 
we add 2a [ij] æ [k] о [I] = 2(к®--кїк?)к®к!._'ТҺе net result is that in (10) we retain 
only terms corresponding to connected or non-dissectable arrays. 


. 


That this cancellation will always take place exactly is not quite obvious. 
The proof is mainly a matter of devising a suitable notation, and is given-in James 
(1958). (Theorem 6.1). Essentially one uses relations of the type 


alij, k, T] = к), k, И-- (Li); k] K+ Lj, 7] De]-H- ke, 7] кү) А-к[91К1)К[0, « (11) 


in which all coefficients are -+1, and proceeds by induction on the order of the highest 
cumulant involved. р 


In order to obtain the K’s (to a given order in у) in finite terms, we shall assume 
that 
KESO (= 1, ..., р). ч а) 


More generally, we could take, say, K’ = 0(v-*), but the results would be more compli- 
cated. With this proviso, we see that to determine all terms of order v-* in a cumu- 


*We have, however, reversed the row-column convention, as used by these authors, The present 
convention is more in accordance with matrix notation. 
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lant K“:--’ of order p, we have to consider all arrays which satisfy the following 
conditions. 

(i Number of rows = p. 


(ii) (Number of letters) —(number of columns) = X(number of letters in 
column totals—1) = k. 


(ii) There are at least two letters in each column total. 
(iv) Array is connected. 


Of course arrays obtained by mere permutations between rows or between columns 
do not count afresh. For example, the arrays giving terms of order v~? in K"* are 


ijs ikji k 
313 j |J 
k | k l 1 


апа Кее = (АБАЗА рк ро Ў Ag А Арк 4 O(y-9), 


The factor 2 in the second term arises because 


2 " 
AfAJAq У кўки = Ag AtA (kök kikik) — 242, АЗАСКИКМ. 


Similar, though not always complete, conversions of the inner summation into a 
numerical factor occur in other terms. The general apportionment between N and 5, 
the numerical factor and the number of terms in the sum, may be set out as follows. 


We call the result of replacing each element of an array by the number of 
letters it contains its univariate image. For example, 


yl qd Аа | 

kmn| k m n> 3 111, ijkm ij k m > 4/211 

r Д r 1 1 Inrs з 94 4 112 
ijk im nr [322 lH Ws [азов 


In this paragraph the expressions "array", “row”, “column” and "element" shall 
always refer to the image array. Two rows (or columns) are said to be identical if 
they contain the same elements in the same positions. Two arrays are doubly related 
if one can be obtained from the other both by a permutation of rows only and by a 
permutation of columns only. (It follows that their total rows and total columns are 
identical.) The degeneracy of an array, denoted by D, is the number of distinct arrays 
_{including itself) which are doubly related to it. The numerical factor N is the product 
of p multinomial coefficients, one corresponding to each row, divided by a product 


50 


CUMULANTS OF FUNCTIONS OF RANDOM VARIABLES 


of factorials, each corresponding to a group of identical columns. The number of 
terms in the sum, S, is p! divided by a product of factorials, each corresponding to a 
group of identical rows, and also by D. For the two arrays quoted we have respec- 
tively D = 1, № = 3.6.1/1!1!1! = 18,8 = 3!/1!1!1!D = 6 and D = 2, N = 12,12/11211! 
= 2, Si 2D == 1. 

A connected array is, by donation: one in which it is possible to proceed from 
any element to any other, along rows and columns, without passing over an empty 
place. Consequently, it may be built up, letter by letter, in such a way that each new 
letter (except the first) starts a new row, a new column, or neither, but never both. 
Hence, if there are p rows, 7 columns and w letters in the array, we have 


> 14H(9—1)t(—1) = р+т—1. 5. (28) 
Thus the order of magnitude of the к product, which is precisely v-* = v-"*7, is y-^*! 
or lower, and so the same is true for Ка", 


RESULTS 


We enclose terms of orders v°, v}, ... in braces, thus : OU ED ae oes 
Ке = {A} + ASK) - (A9, x0 BAG коки} 
Fal AG ык FE 104,5, KEEL LBA Sy KEK 


zn aC Aga COP DBAS pin KOK 1042, КК" 


ЛЕ 105 Ad КЁК!" к" 105 48,5, UKM KM KEY -FO(v-9), 


Ке ASAP 4 Де AL RIEL (8 È A5, 41-248, Ак) 
S AG, 1-48 42) OL (4 È AG AY +6 È Аф d 
. 2 
4.63 Ag Ab, 4:3 X AG ARMM (15 X Аз „А5 H12 È Ар А}, 


D ЭА Ань ds CAm Ahn) k K mN Ta 5 Aga Abd b Al Abn) km 


6 È Aga, 48410 Abel +8 È А, At +6 È 4t, 

з Аль, + 9A Abin) KRM (10 X 49,4574 È Ab Abn 

+6 Ў Аб А5 945,451) КК" - (60 È AS ить 

$45 È Amr A220 È Ag At, +60 È Ань АВ, 15 È At Ahe 

412 дед 18 E Agm Aba, +36 È AS, Ab ny +36 È At, 40, К Немки 


2 
+(105 2 Аф ити As +90 Ў ArmA nikao È Ат 


+60 5 Afr ins n T2AF mAbs t 2448,,,, AD, rime +0( sd 
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Kite — дед®дүк®®--э. X, Ab AL Ane CX Ap Ad Аскон 
+8 X Ag At Ac, +3 Ў A8, 4045-12 E ААР An +4 È ABAD, Ag m 
+2 È Agi, 4046-6 È де 40, 46-6 È Ag, AS 
ВАЗ АВ Af, TRB mM) i È as, pae + $ 45.484, уком" 
+(4 È Аа А846 +6 X Agn AAH Ўл А43 È Aig ABA 
+6 È 45,45, Af +3 È Ае AAA È Ае AL, Af) tem 
ДАЕ 6 È Ар АБАЕВ È Ае Ар АСЗ X At, A, At 
део n HAARA Aink EKIN а (90 X At ALAS 
+30 È Ау„„А}А;+Н15 ® Арни} È Афы4А5,„А;--12 È Аф LA; 
+4 3 Аф„А„А;+-12 È Ар, А5 А12 È А Af Ar È А AL AT 
+18 3 45,45, Ac--9 E Ag A0, AHI 548,40, Ар E At, AP, Ai 
+18 E 48,45, Ai H12 E Ag ab де +12 È Ag, ap At, F6 È А „АА, 
+24 È АА Ар, 6 Ў Ag „АА; emer 
4090 È Ары „,А%А:--30 È Ati, Ad, 41-100 È Ag, Ab, 43 
+36 È Ag, AL, 41-90 ® А6 „А6 42-124 X At, АВ AE 
+48 3 4,4 At, H24 ЗАА, Ah AE, 418 È At Ad, At 
418 € Afi Abin, AG, +36 b At, Ab, AS.) kk eme rt |-О(у-5), 
Ketel — АТАБАСА кан 129. At AP Ap Ad ems (6 X Ap SATA 
+43 дуд} ATA Dye iem C3 дедь деда em. 4 (3 $ ААА АФ 
+3 È дл}Аүл{ ++ Ў Ag AL, Att Ае Ap Ag Ape 
4X Ag AtA A2 4-2 5; А545 А A144 X АЗА „4 Ad) ick imn 
+(12 È Ag, Ард A1412 È да, „дедов Ў As, Ab, A541 


24 
+6 > A845, 4241-16 È A5, A5 AAt 6 Ў АА, Ас, Ad 


24 
+12 È 45,4), 4541-6 È Ag Ab, злее È At, 40 ASAS 
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24 12 12 : 
+4 È 4545 Аг „А4-Е8 DASA? 45,414-8 X. АА А AT) kr emen 


jn mr 


4 24 12 
+(60 È А8 „48449-24 È Ab pm A? 4540-24 Ў At, 454547 


djkmr 


12 12 24 
T18 X EU 2444-18 x А Ab Ap Ag+12 X Ag Ab. A240 


24 12 
+12 E 48 и А9 49,44 --24 > Alm А), АА 


3 
+16 X A1, A! 


jm 


At. 41.) kü Kal кт" к" --0(у-5), 


Касае = КАТА АГАТА КЎ" 2 Ў Atn A? AGAFA, кіт" 
15 30 
+2 X АңА}А А Anki (6 > А.А АРА, А, 
60 60 n 
+4 E 42,4) A5 А Ае --4 > AGA’, ASA AS Окт" 


$(24 È Ag, APAALAt 4-12 È Ag A, ALAS А; 


8 È Ag, 41, At, ALAS kö ктт) LO(y-5), 


i / 30 ў . 
Касае? — (ATA) AL AQ А А к#п 2 > At, ALAC AJA AT Khim cnr 


0 
12 È Ag, AJAG AJA AT remm L(6 Y A, AMAT ATAS AL 


180 120 
+4 X AL A? As AIAS AIHA X At, Ab AL ALATAT) KiB ming rt 


i0 
(6 $ As, ATALAT, ABAI HA АБВ АТО As AL 
60 

+4 $ As ALAS AL As, Af) нк кт (4 E Ag, АЗАДА AT AT 

| 
4-12 5 де 4) 441,42 A{+12 $ At, Ah, ASALATAL 

120 
4-123 Ag, AL, AT ALAS ALH8 È A84), А, СИИИ 

T b b d Ae Af yc кт gen ett 
+8 È 4645 Az, ALACALL8 Х A845, A5, A] AS Ap RMI en 
120 

40120 $ а AVAPASALAL+ 48 È АБ AL АГА АРА 


360 
+36 $ as, Ab AAT AAAA 7 АА, Ap AIALA, 


+4 È Ag, A,A, APARAN 


ав Ag A, At, Ad, AL ADI KPKK KA -O (y). 

Tn these expressions the summation signs refer to those permutations of a, b, ... which 
produce distinct terms (see above). 
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For the special case of а single function of a single variable, these formulae 


become 


К, = 040) кАк, ЗАК) +а{А,к,-+ LOASK3K y+ 154,к3) 
+.45к5+154,кк, - 10442-1054 e, K2-1- 105A44k3) --O(v-5), 


Ky = (Айк) Аз Ак, --(6АзА, 24) 8] -,((2454, I A3)k, 
F(204,4,-- 18454,)ksk,-- (304,4, 4-244 ,4, 4-1543)i) 
7424445 -2454,)ks-- (30454, 1-284, 4,-- 1543) Ky 
- +(#044,-+-20А,А,-+-94$)к$--(210А„А,--190А,А„--204А,А,)к,к$ 
7E (210474, -1804,4, 1-210454, 4-96 42) --O(v-5), 


К: = adiks E64, Арка) + ЗА ATI НОТА, А-24ААЗА кк, 
: 1 (364,41--724,4,4,-- 84) 4- (34541-3434), 
© EU24,4H-844,4,4, 4-12 9) s -(304,424-544,4,4, I-104])k2 

"(3854,41 --5524,4,4, 4-297 44, --2524, 43) (2704 542--5404,4,4, 
7-8764,4,4, -2164,43--27043 44) 8 --O(v-5), 

К. = AAs + 24A Tek + (944,41 484242) ) (44540 
+484,41-4-7242Аў)к Ka (364,42 1-484242) 
+ (2884.41--9364,4,41--2884} 4.) ,к -(3404,42-1-8044,4,4 
7-432 45 41--8614, 454, --48 А) к + 0(y-5), 

K,— «Ак 40A, Ad es 304,411 --(1804,41-1480424]) ied 
7F (1204,41 1-7204,4, 43-1- 48042 42)3) --O(v-5), 


Ke. Aiko +604 S TK s -1204, AG qs (3604, AT-- 1,204243) K? 
TE6404547-I 1,800484) ji, I (1,4404, 43--10,8004,4,41 
779,600434)k,Kdl-- (7204; 41-5, 7604 ,4,41--3,2404241 
7-17,2804,4243-1-5,76042 42) к} 4-O(v-*). 
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DISTRIBUTION OF THE DETERMINANT OF THE SUM 
OF PRODUCTS MATRIX IN THE NON-CENTRAL 
LINEAR CASE FOR SOME VALUES OF р ` 


By О. P. BAGAI 
Government College, Ludhiana, India 


SUMMARY. The distribution of the determinant of a.non-central Wishart matrix of order p 
with a non-centrality matrix of rank one is worked out for p — 2, 3 and 4. 


1. GENERALIZED VARIANCE AND ITS MOMENTS 


Wilks (1932) defines the generalized variance to be the determinant of the ` 
variance-covariance matrix and considers it to be a measure of the spread of the 
observations. Let S be the sample variance and covariance matrix with n degrees of 
freedom (d.£.) and X(px p) = E(nS). The h-th moment of |A| = |n] in the central 
case is given by Wilks (1932). Let k? (i = 1, 2,..., p) be the positive roots of the 
determinantal equation 


\т—к®| = 0 


where T is the non-centrality matrix of S. Assuming k? = 0 ( = 2, 3, ..., p) and 
1,5 0, Anderson (1946) gives the A-th moment of| А |in the non-central case as follows : 


Eu LT I 
BJA| = 2% exp (09 т р za E. ) d 
2 \ 


(1.1) 
pur) ) 


Making use of this moment, we find below the distribution of | А| in the non-central 
linear case for p equal to 2, 3, and 4. 
2. SOME PRELIMINARIES AND INTEGRALS 


We list below Legendre's duplication formula and the values of some definite 
integrals obtained from standard books of tables on integrals. Also we list two 
other definite integrals which we have ourselves evaluated and published elsewhere. 


(i) Legendre’s duplication formula for the gamma function : 


T (a44) Pati) = TOT TD. (2.1) 
(ii) For a > 0, Larsen’s Table (1948) gives 
Ў exp [—G2--92-?)] de = 4/7 exp (—2a) 209) 
0 
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(ii) In Table 98 (рр. 143-144) Bierens de Hann gives the following two 
integrals:* 


T 97 окр ort] de 
© (a-d-1—n)"A 


=(q/p)"" exp (—24/pq) ут 2 Gar E (2:0 


and 


=—з 


& 173 exp [—(pz--qz-!)lx 
а ; эл $ (a—n)n 
= (play" exp (—2y/pq) ул 2, £p 


(iv) We now give two other integrals which we have evaluated of our own 
(published elsewhere). Their evaluation is outlined as follows : 


(2.4) 


(а) Consider 7 = 7 & exp [— 2(2--ах-1)] а. 
0 - 


Tt can be easily evaluated by setting а = io = 2,q = 2a in (2.3); but we have eva- 
luated it by a different method as follows : 


1 


Set ж = 1 u and b= 4а. We obtain Ko) =4 


satisfies the following differential equation : 


ФК К 
be WK =O 


Solving this differential equation, we set b = 4a and obtain : 


- = . 
Г изехр (—u-!—bu)dy which 
0 


ў, 2 exp [ера] ш [ Em) 8 4a ] 


S Ga) р (4a)? , 11 (4a)? 2.5 
A neta a CAH 0.) E 
Where у is Euler's constant. 
(b) Consider the integral 
La) = 2 7 а?" exp ([—2—а2- ах ... (2.6) 
0 
where @ is real and Positive, 
It satisfies the differential equation 
d3L, 2 
3 a ett BL, = 0. ex (2.7) 


* In both of these, Kramp’s notation is used, namely, айй еда). mes 5 =. 
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Solving this equation, we have 

© 274-2 
La) = E 2—2 "bush Ye 1 ж к, 
o(a) (7 ова) Ç CU шоу. ; 


23 4(124) 
hr tap “pape? =] 


s QS СИР 1 

rafe bx tl euet 20 

9 a^ = Uu giri 
La) = 21--9— lega] $ E ( es E 


+[1 pat] 


3 r- Xati 
—Г@) (a+ x (—1) в &) no (2.9) 


ща) = НОГ) log a4 — € (—тун =". oo 9] 


ЕТШЕ 16:0 
+Г@) (1L a* Lats вт) 


Dabo ВЕ qi 


iN | и ve 
(8) (a ЗО 71.9. Т1 


11 X $ 
e. ... (2.10) 


ОЕ ох, SOIT ERI 
113 135 urs 


ДЕЗЕ Lae ОД 


Іа) for r = 3, 4, ... can be evaluated the same way as above by obtaining 


+ 


the solutions of their respective differential equations. 


3. DERIVATION OF THE SUITABLE FORM FOR FINDING 
THE DISTRIBUTION OF |А| 
From (1.1) it is clear that 
E|A|* = E(ugu, ... tr)", 


if the joint density of the и; ‘в is 


qoe iet. (Cis 1] 
SET 
2 т о 2, r 
п x Ig Pur m 
21 
(0<u,<0), (¢=0,1,2,...,p—1). 
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From this we calculate that the distribution of |A| is the same as that of 
llos Uy) +++ Up. After a little manipulation and setting n = 2т--р--1 (p < n), the 
joint distribution of u; (i = 0, 1, 2, ..., p—1) becomes 


TR ирк 
gpm-Hi»-r t) rao [(m+4r+1) 


exp [ -FE u | exp (00 


[ 14-Uo (1/2) + ub (СР ] ] de; (3.1) 


1! 2m+p+1 21 (2m+p+1)(2m+p+3) "Er i-o 
(0 < ш < 0), (Ф=1,...‚р—1). 


4. DISTRIBUTION OF |A| UP TO THE ORDER 4 IN 
THE NON-CENTRAL LINEAR CASE 


р, vase 1: For p = 2, the joint distribution of uy and v, from (3, 4) is 
y r9 а unti 
кып Vas M FR m 
ТЕТ учу OP (Мы) 


ш (И?) мо (i29 
| E es абыл: | ааш, E (4.1) 
(0 < uy, и, < oo). 


Making use of (2.1) and setting шуну = V2, uy = 2V3, we get from (4.1) the simulta- 
neous distribution of V, and V, as follows : 


S уен РА vi 
Va emp РС) (7) 
и в у! ks 
14 4 Vs 
[i н "21g xis [4 


(0 — V, V, < oo). 
The distribution of V,( = щи) is then 


2 exp(—4MDVpeeqy, 7 y; 
Ут PGm43) С [ exp P n) 
Y B v kt 
1 2 ES 
| (d ШО Ез Fajar, ve (4.2) 
(0< V, < oo). 
oo 
Now consider, TE [ а 
| ҮЕ exp { ayg jie 
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For r — 0, we use (4.2) and obtain, 
[epp (i ау, = V" exp (—V. 
| д") 3 p (—V1) e (43) 
and for r 4 0 we set V3 = t and the integral J, reduces to 
oo 
L=} | t= exp (2-71) dt. 
We now use (2.3) and obtain for (r 52 0) 


oo ^ 


or ‚= М" exp (=, 2 (44) 
aa VETUS EE Aree MS 
where d — [5 ] р Е эү 1. ... (45) 


Thus, with the help of (4.3) and (4.4), we obtain the distribution of V, (= yuo t) 
from (4.2) as follows :- 


vim exp (—V,—18) Е Е 
Г(2т--2) 1! 2m4-3 


T, и ] 
tor Gm FMF’) ate Ja: © (4.6) 


where 0« V, <%, m= ums and T, (r 52 0) is defined by (4.5). 


Tt follows from (4.6) that in the central case, i.e., when k? = 0 and m = 3(n—3), 
the distribution of V, = Мо is 
ao Vr? exp (—V,)dV, for 0<V, < co s (ал) 
which is that of gamma variate with parameter (n— 1). 
Case 2: For р = 3, the joint distribution of Up, ш, and tg from (3.1) can be 
written as follows : 


ет oe n 


Г(т--1) гө) T42) 


p | -2 È ui) | 


uy №2 _, we (2 
x [1 +% we +3 EE ES ] Фиби аи, — ... (48) 


where 0 < Up, t3, % < OO. 
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Using (2.1) and setting uuu, = Vi, мүш = V3, uy = 2V3 in (4.8) we obtain the dis- 
tribution of V, after a little manipulation as follows : 


1 Vfexp(—iH) fo | 
Vn T(m4-1)T (2m 1-3) "E up a eim ( yi зу} 5j 
viu 7 kh 1 y ‹ 
E ТИ mpa 31 т тв) + aera? Дыр 


where 0 < V, < о. 


i 7 79714 Аан 3 a 2r Y? Ec: 
Using (4.2), | exp Я ву; jars = 42m V$ exp | vw s] 
Then (4.9) reduces to 


Трехр ( а) 


у 
Э"Г(и-1) Г(т-3) „Р (^r s = 
в v kf 
№ "a бийс? y Javar, E 


where 0 € V, < о. 


Now making use of the integral (2.6) for r = 0,1,2. . given respectively in (2.3), 
(2.9) and (2.10) ete, and remembering that а іт (2. 6) i is equal to (#V,)!, the distri- 
bution of V (= upu) is 


УР exp (— iB [2 7 Ty), H в (4/3) 


Ё 2 HT(m -- Y)T(2m 4-3) 1! 


+ mp4 
a МЗ) 


21 н 086) 7)" "m 
where 0< V, < oo and т = = (n—4)/2. 


For the central саве, i.e., when k? = 0 and m= (n—4)/2, (4.11) becomes 


У" 


—— ео тр P 
eR —1) T (n—1) L, (М$7,) (0 < Fi < 00) ... (4.12) 


where Ly (4/3V,) is defined by (2.8). 
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Case 3: For p = 4, the joint distribution of tọ, ty, и and из from (3.4) can 
be written as follows :- 
э—4(т-1) um ugh tt unt) 


т 
ч 


3 
T(m-E1) ГЕ Pon+2) нт exp (—4 È м) 


Uy (#112) wj (18/2)2 
Е Sis i ЖЕШ Cmt em N Г” а, du du, dus s. (4.13) 


(0 S Up, 01, Ug, Ug < 00). 


Using (2.1) and setting и, ug Uy % = Vy, мунун = 2V3, ujug = V2, w = 272 in 
(4.13), we obtain the distribution of V,(=2pw us) after a little manipulation as 
follows : 


27% exp (00) rae 


Г Г (- ЕНЕД 1) 
упГ(2т--2) (2-4-4) v,=0 vs=0 V;=0 


AY; Và 4Vi 


[14 Vi М Vi ki 


А 
т зат. ау, (41) 


where 0 € V, < oo. 


ма use of (2.2), we integrate (4.14) with respect to V, and obtain, 


Vit ea ӨК р S (=) ` г mL Van an үү кы ЫЎ: р? 
"aT (2m-4-3) F(2m--4) То” exp ( Y, ) pare ( n; i) 
Le ee. в e M dV, dV, ат 415 
[H "1 2m4-5 ta бит) * | а 


where 0 < < Y, <. 


To integrate with respect to V4, we evaluate again the first integral as before by using 
(2.2), while in the others we set Vj = t and then, using (2.3), we obtain in place of 
(4.15) the following : 


V? ехр( 8) ИИ: 
2T(2m4-2) T(2m+4) А dus »( 5) 


ИЕ и 
Шо щт (2m--5)2m4-7) ejm EH o 


where 0 < V; < oo 


9 (rL1—2)" 


ES UN V. 4.17 
and L-i(y) vrap CT) Zu (4.17) 
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Further to evaluate (4.16), we have to use either (2.3) or (2.4) for p = 1, 
q = yV, and various suitable values of а. This determines the distribution of 
V, (= ^tgttu5u5) where it should be remembered that m =}(n—5). 


For the central case, we set k? = 0 in (4.16), and we obtain the distribution 
of V, as follows : . 


а раш УТ: у. ат, ат, 0 < V, <) ... (418 
3T(2m--2) Im 4-4) E AB E Te 7] Ея ©) 09) 
To evaluate (4.18), 'we can, of course, make use of (2.3) fora — 3, p — 1, g = VV, 
but we prefer to use (2.5) and then write the distribution of V, = 10010505 in the centra! 
case as follows : 

n—b 


V, ? dy, (1+-2y)—log а 
T(n—3) T(n—1) | 2 ( 


а? 
+ 3 


Hrs atus o] E 1») 


where 0 < Y, < oo and a = 4/V,. 


We could also use, by setting r—2, р = La = 44/v,, in the following 
integral : 


[72] a Ww -— 
[tesi 3( 1} К, (Мар) (Bateman, р. 146 (29)) 


where K,(z) denotes the Modified Bessel Function. 
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POLYSOMIC INHERITANCE AND THE THEORY OF SHUFFLING 
By P. А. P. MORAN 
Australian National University, Canberra 


SUMMARY.. A theory of shuffling is applied to olucidate difference between chromosomal and 
chromatid segregation in polysomie inheritance, Attention is drawn to unsolved problems in shuffling 
theory. 


In diploid individuals the chromosomes (other than tho sex chromosomes) 
occur in similar pairs and the process of segregation, considered for factors at a single 
locus, is quite simple in its end result. Thus, if there are two possible factors, A 
and B, at a single locus the gametes produced by an individual of constitution AB 
will be A ог В with probabilities $ wherever the locus is situated on the chromosomes. 
In actual fact the cytological process which brings this about is considerably more 
complicated than might appear from the above statement. 


The situation in autopolyploids is quite different and much more complicated. 
The number of chromosomes in each individual zygote may be even or odd, Wo 
confine the discussion in this paper to the case where the number is even and the 
more detailed algebra to the case where there are four chromosomes (tetraploids). . 


Segregation can then occur in more than one way. То illustrate this suppose 
that there are only two possible alleles, A and B, at a single locus so that a gamete or 
zygote may be written A*B" where z--y =з. For tetraploids we will have s = 2 
for a gamete and s = 4 for a zygote. 


If we have a zygote of form A’B*"”’ its offspring gametes will be of the form 
A*B"-" but there is more than one way of determining the probabilities of production 
of the various types of gamete from a specified zygote. Two cases most frequently 
considered are known as chromosomal and chromatid segregation and it appears that: 
most observed cases are either one of these or a mixture of the two. 


In chromosomal segregation the gamete is formed by а process which is equi- 
valent to choosing m chromosomes at random, and without replacement, from the set 
represented by A'B?", "Thus the probability of obtaining a gamete of the form 
АВ" ів 

тү үт 2m—r 
UR) (s) Caz) 
In chromatid segregation, on the other hand, the gamete, containing m chro- 


mosomes, is formed by a process which is equivalent to assuming that each of the 2m 
chromosomes in the parent zygote divide into two thus forming a set which we may 
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denote as A?'B'""-?', and that from these a selection of m chromosomes is chosen at 
random without replacement. "Thus the probability of obtaining a gamete of form 
A'B"^* is 
-1 

4m 2r\ /4m—2r 

Ur eer) 
This results in quite a different set of probabilities and in particular makes 
possible the formation of gametes of types which could not be formed by chromosomal 
segregation. For example a zygote of form AB? will give gametes of form АВ and B? 


with probabilities } under chromosomal segregation and gametes of form АА, 
AB and BB with probabilities у, 3, 38 with chromatid segregation. 


Fisher and Mather (1943) have introduced parameters to describe types 
intermediate between chromosomal and chromatid segregation. Consider a tetraploid 
zygote. Tts diploid gametic offspring might be formed of two chromosomes identical 
with two different chromosomes chosen at random from the four chromosomes of its 
parent. This is what always happens in chromosomal segregation but only happens 
with probability ¢ in chromatid segregation. Alternatively, the two chromosomes 
in the gamete might be identical with a single chromosome of the parent, chosen а! 
random. This never happens in chromosonial segregation but occurs with pro 
bability + in chromatid Segregation. Cases intermediate between the chromosomal 
and chromatid segregation can be described by writing æ for the probability of the 
second mode of formation. Then a zygote of form АВ? will produce gametes 
of forms АА, АВ and BB with probabilities 4, 4(1—a) and }4}a. The para- 
meter æ thus completely specifies the mode of segregation. With hexaploids, the 
triploid gamete may be derived from three different chromosomes with probability, 
1—f say, or from two only (one supplying two chromatids) with probability 2. Again, 
a single parameter, 7, completely Specifies the mode of segregation but with octa ploids 
two parameters are necessary. 


Tn general, chromosomal Segregation occurs for loci which are near the cen- 

tromere and chromatid segregation for loci which are far from it. The problem we 

' have to consider is to explain how this comes about in terms of the actual process of 
segregation. - 


The gamete contains only one half of the number of chromosomes in the zygote 
ies it А not produced by a simple process of reduction or halving but by a process 
- which involves first а duplication and then two reductions. 


Suppose that there are 25 chromosomes. Meiosis begins with a pairing of these 
chromosomes along their length thus forming s pairs but this pairing does not remain 
the same along the whole length. Thus if there are four chromosomes denoted by 
1, 2, 3 and 4, we might have a pairing such as (12) (34) near the centromere (the part 
of the chromosome which controls the ultimate splitting) and at some distance 
along each arm this may change to (13)(24) or (14/23) only to change again further 
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along. In a tetraploid the resulting figure is very like that formed by the four 
chromatids in a diploid just after crossing over with the difference that the four 
centromeres are here distinet. : 


The chromosomes then each split into two chromatids so that there are now 
4s chromatids in all. At any place where two chromosomes were paired there are 
now four chromatids and between these crossing-over occurs. The essential feature 
of the situation is that owing to the interchanges in partners in the pairing of the 
chromosomes crossing-over may not only occur between, say, chromosomes 1 and 2, 
but also between 1 and 3, 1 and 4 and so on, at points where these chromosomes are 
paired. The end result after a sufficient amount of re-pairing and crossing-over has 
taken is that the 4s elements of the chromatids at a locus far from the centromere have 
been shuffled about so thoroughly that they are attached to the 4s portions of chromatid 
near the centromere in a manner which is effectively random, i.e. they have been 
“shuffled” into a random permutation. The aim of the mathematical theory is to give 
a description of this process of shuffling. к 

The cell then undergoes divisions twice giving four cells each containing 8 
chromatids which become the chromosomes of the gamete. In the first division the 
2s pairs of chromatids are separated into two sets of s pairs in a completely random 
manner (“random disjunction”). At the second division the two chromatids in each 
pair come apart, beginning at the centromeres, and each enters a different cell. Thus 
we see that near the centromere where there is no re-pairing and crossing-over between 
the locus and the centromere, segregation will be chromosomal whilst a long way 
away from the centromere random shuffling will result in chromatid segregation. 


We now confine ourselves to tetraploids. If there are only two alleles, A and 
B say, the forms of zygote we have to consider are 4?B and A?B? (АВ? having the same 
theory as 43B). The theory is then particularly simple. However, we consider here 
the general case of four alleles so that the zygote is represented by the symbol ABCD. 
We now introduce a symbol to denote the manner in which the eight elements 
AABBCODD representing the alleles at the locus considered on the eight chromatids 
are joined to the four centromeres which are associated in pairs. Thus if no re-pairing 
or crossing-over has taken place the initial state can be described by the symbol 

{((4A)(BB)I(CC)(DD)I; 

or by one of the similar symbols obtained by permuting the letters А, B,C and D. 
Thus the above symbol indicates that the two chromatids carrying the allele A are 
joined to the same centromere which is paired with a centromere joined to the two 
chromatids carrying B, and similarly for O and D. This symbol can also represent a 
state in which re-pairing and crossing-over has occurred in such a way that as far as 
the locus under consideration is concerned the joining with the centromeres is as des- 
cribed. We denote by S, this state and all states obtained from it by permutation of 
the letters A, B, С and D. We shall also use 5; to denote the sum of the probabi- 
lities of this state and its permutations, and since the initial pairing at the centro- 
meres is random, all such permutations are equally probable. 
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We now have to enumerate all possible configurations and in order to do so 
itis a help to classify them. This can be done in two ways. Let x be the total number 
of doublets, i.e. symbols of the form (4.4), (BB), ete. Then » can take the values 
0,1,2and 4. Similarly let m be the total number of different alleles which are joined 
to one centromere pair. This is equal to the total number joined to the other centro- 
mere pair. m can take the values 2, 3 and 4, but not all pairs of values of m and n 
are possible. Thus if m = 4, n must equal 0. The number of possible pairs (n, п) 
is seven but to one of these corresponds two configurations. Table 1 shows the possible 
configurations with typical symbols, all symbols obtained by permutation of A, В, 
C and D being equiprobable. 


TABLE l. THE EIGHT POSSIBLE CONFIGURATIONS FOR 


A TETRAPLOID 

configuration n m 
St 1044) (BB) ] [ (00) (DD) ] 4 2 
5: [(AB) (AB) ] [ (00) (DD) ] 2 2 
Ss [(AB) (00) ] [(AB) (DD) ] 2 3 
Ss [(AA) (BO) ] [ (DB) (DO) ] 1 3 
85 [ (АВ) (AB) ] [ (0D) (0D) ] 0 2 
Ss [ (4B) (40) ] [ (8D) (OD) ] 0 3 
Sr [ (AB) (CD) ] [(AB) (0D) ] 0 4 
Ss [(4B) (OD) ] L(40) (BD) ] 0 4 


We now consider the effect on each of these states of an exchange of partners 
occurring between the centromeres and the re-pairing or crossing-over nearest to the centro- 
meres. Clearly, a re-pairing will turn 5, into S, and a crossing-over (between non-sister 
strands) will turn 8, into S, However, the outcome is not certain. For example, 
such a crossing-over may turn S, into Sj, Ss, or S; and will do so with probabilities 
4, 4, and $ respectively. Writing such an outcome аз 185,--45,--15; and enumerat- 
ing all the possible cases we obtain Table 2. 


TABLE 2. EFFECT OF RE-PAIRING AND CROSSING-OVER 
NEAREST THE CENTROMERE 


original state effect of re-pairing effect of cross-over 
Sı Sı БЯ 
Sa Ss 181 + 4S2 + #55 
Ss 383 + 153 5 
Ss Sy 183 + 35% + #55 
Ss Sy 482 + 45% 
55 34% + 155 1S, + $5 
a Ss 155 + 35: Ss 
Ss Ss 4S; + 453 


——— wu ЭШ ЫЕ: dS _ 
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Suppose, now that we have a situation in which re-pairing and crossing-over 
has occurred in a certain order along the arms of the chromosomes between the locus 
and the centromere. We can regard the final configuration as seen from the centro- 
meres as the state of a random process of the type of а finite chain. We suppose 
that we start from a state in which there is no re-pairing and no crossing-over and 
introduce the latter in order, beginning with the one nearest the locus. The process 
is similar to a Markov chain except that we now have two different types of matrices 
of transition probabilities corresponding to re-pairing and crossing over. 


The state after any number of these can be represented by a column vector | 


P = (р. Day +++» Ps)’ 


whose elements are the probabilities of the states S,,... Sg. The initial state is then 


Po = (1, 0, ..., 0), 


and the final state will be obtained by pre-multiplying this vector by matrices repre- 
senting the effects of re-pairing and crossing-over, this pre-multiplication being done 
in the order in which these events occur starting from the locus and moving towards 
the centromere. Let T, and Т, denote the transition probability matrices associated 
with re-pairing and crossing-over respectively. ‘Then from Table 2 we see that 


ET 0 0 0 0 0 0 071 
ОО Оо 2 0: 2770 
0 1 i 0 0 0 0 0 
T,= 0.0 0 1 0 0 0 0 
О Ос 0 T 
0 0 0 0 0 i 0 1 
Dimer ОАО Olam Ir 10) =F" 0 
ОООО о тос 0 
апа = 
o + 0 0 0 0 0 0 
1 i 0 0 i 0 0 0 
О ONE E0007 -0 
T,= м A qu 0.0 
Dae en mee оо 
О о Ор ото 
0 0 0 0 0 0 0 i 
0 0 0 0 0 0 1 2 


The probabilities of the various possible final states will be then obtained by 
pre-multiplying po’ by T, and T, in the order in which they occur. 
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T, is à matrix which, by the re-arrangement of rows and columns, can be 
represented as a diagonal matrix of five matrices, two of these being of the form (1), 
and three of the form 


9 1 

13 
Thus the characteristic roots of Т, are A = 1 (five times) and А =—} (three times). 
Any characteristic post-vector corresponding to the root A = 1 will be of the form 


р = (ti $2, $ to ta pns Boy Sos фа), 
where 21, ta, 23, 2; and x, are arbitrary. 


Similarly T', can be transformed, by exchange of rows and columns, into a 
diagonal matrix of three matrices which are 


© = © 


#0 
t + 
id 


twice and 


once. Thus the characteristic roots of Т, are А = 1 (threo times), А = } (twice), 


A = —4 (three times) and any characteristic post-vector for the root A = 1 must be 
of the form ` 


P= ($91, $9» $97» ФУ» $V» $9» 39 40). 


If we compare this with the general form of the characteristic post-vector for 
A = 1 of T, we see that any characteristic post-vector (A = 1) of both matrices must 
be a multiple of = 


(1, 4, 8, 32, 4, 32, 8, 16)’ 


and since to represent probabilities the sum of the elements must add to unity, must 
therefore be 


n= (1 4 Sr asa 4-339018 16 V 
105° 105° 105’ 105’ 105^ 105^ Е * Зо - 


We now enquire under what circumstances an unlimited repetition of the 


operations T, and T, result in а convergence of the vector representing the probabi- 
lities to т. 


| That this cannot hold in general can be seen as follows, ^ Consider first the 
operation 7, Т, performed indefinitely often on the initial vector Po. 
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characteristic post-vector of both Т, and T, for A = 1 it is also a characteristic post- 
vector for T,T,. We can show that it is the sole post-vector for A = 1 so that unity 
is a simple root of Т,Т, by considering the accessibility of states in each operation. 
Thus in the operation T, state 2 is accessible from state 1 and is the only such state 
whilst states 1, 2, and 5 are accessible from state 2 and so on. Table 3 shows the 
states accesible from each state in the operations T, and Т, and also in the opera- 
tion T,T,. 


TABLE 3. ACCESSIBLE STATES 


initial state state accessible with 

Б Т, Т, T,T, 
1 1 2 3 
2 3 1,2,5 1, 3,7 
Б] 2,3 4 4 
4 + 3, 4,6 2, 3, 4, 6, 8 
5 7 2,5 3,7 
6 6,8 4,6 4,6,8 
2) 5,7 8 6 
8 6 7,8 5,6,7 


From this it can be verified that (T, T,)* is such that every state is accessible 
from every state. Such a matrix is said to be positively regular and is known to 
have only a single root of unit modulus. Thus 


(T,T)* р = lim (Т,Т.)" p, = тп 
n—»00 


and similarly (T, T;)! has all its elements non-zero and is thus also positively regular. 
On the other hand it is easily seen that 


. lim lim T? T? Po тп 

т—усо n—00 
since it can be verified that for no values of m and n are states 4, 6 and 8 accessible 
by means of T? T? from state 1. In a similar way 


lim lim T} T? p, zm. 
n—>0O т—уСО 


Thus we have a situation in which the ordinary limiting behaviour of a Markov 
chain, with а single matrix of transition probabilities, does not occur. 


Only with some further restrictions on the order of application of Т; and T, 
can we conclude that after a number of repetitions of Т; and T, will the probabilities 
of the various states be given by п. We have already seen that this will occur if they 
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alternate. This is not the sort of condition which is plausible biologically. Suppose 


that we assume that n exchanges of partner occur between the centromere and the | cus 
and that in the n+1 intervals formed by these crossing-over occurs with frequencies 
which have the same discrete non-degenerate probability distribution. Then clea: ly 
by the same type of argument as used above, as n increases, the frequency distribution 
of crossing-over remaining unaltered, the final distribution will tend to m. Thus 


when we get well away from the centromere the probabilities of the various states will 
be well approximated by the elements of m. 


We shall now show that if this is 80, segregation is of the chromatid type and 
also see how the parameter œ сап be expressed in terms of the sequence of T/s. We 
have to calculate the probabilities Р(АА),... of obtaining gametes of the form АА 
DD, AB, AC, ... СР. From the initial assumption that each state S, corresponds 
to all possible permutations of the symbols A, B, C, and D with equal probabilities 
we have 


p(AA) = p(BB) = 200) = p(DD) = фа, say, 
and P(AC) =. = p(CD) = i1—2a). 


Each state will have a different value of a which we write as a. By straightforward 
examination we can evaluate these as in Table 4. 


TABLE 4. VALUES OF a 


If we write these values as a row vector 


а = (0, n Tn b $, $, 4 $) 
and the probabilities of the states are given by a column vector p, the value of 2 
will be | 


@=@р. 


In particular we see that an= 


and this corresponds to chromatid segregation for which P(AA) =... = p(DD) = 
(8)! = А. Thus in this way we can find the correct value of а corresponding to 
any specified sequence of re-pairing and crossing-over. 
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SoME GENERAL PROBLEMS 

The above results suggest some general problems about what may be called 
non-homogeneous processes of Markov chain type. In the problem considered above 
we had two matrices of transition probabilities both of which had multiple unit roots, 
and a single common right charaeteristie vector for the value A — 1. Consider a 
simpler situation in which we have а process which can be in any of n states and two 
matrices of transition probabilities, 7, = (py) and T, = (р,;) which each have only 
a single simple root of modulus unity (which must equal unity), and a common right 
characteristic vector. Assume further, for the sake of simplicity, that for each of these 
matrices the other roots are unequal. Let А = 1, Ag, ..., A, be the roots of T, with 
right characteristic vectors t... t, and similarly ш = 1, д, ..., И» and м; = t, 

„ for Т. ` 


Then the £; are linearly independent and if p, is any initial vector of probabi- 
lities we can write 


n 
P=t + 52 а, 


n 
so that T? p, — t, + E АЁ t. 
-2 


Then аз № tends to infinity, Jp, converges to t. In fact since we can write the 
difference between these as 


Тїр, — t, = X аА, 
4=2 


we see that, in a certain sense, Түр is closer to $, than p, for any vector р of pro- 
babilities. The sense in which this is true is that the length of the vector p—t, is 
always reduced by the operator 7’, when the length is defined in terms of the coordi- 
nate system defined by the n vectors £;, ... Ё, which span the whole space, since they 
are linearly independent. 

A similar result holds for the а T', which reduces in length any vector 
р = p—u, p a vector of probabilities, the length being now defined in terms 
of the coordinate system defined by the n linearly independent vectors Uy, ... Wn. 

These results suggest the following conjecture. Suppose that p, is any vector 
of probabilities and that 


Sy = T T sc T, T, 


‘у ty-1 
where ty, ... i is any fixed sequence of l's апа 275. 


Conjecture. As N tends to infinity, S yp, tends to Ё, for any prescribed sequence 


do SE 
TI 
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This result is false. Consider a process with four states, and two transition 
matrices 
0 1 


md 0 i $ | ГУ 0 0 
Arae (50. 20 о. $ 
Ti Т = 
QUE UE OEC fer 00 -5 10 
Б AE! 0x0 c d. $ 


T, is obtained from T, by interchanging the first two rows and the first two 
columns. The characteristic equation of Т, is 


AA. —1)0*-- 1-1) = 0 


and the roots are А = 1, А = —}(1 + 4/—3), А = 0 so that there is only one root of 
modulus unity. The characteristic equation of T, is the same and both have the 
` vector (2, 4, 4, 1) as the right characteristic vector corresponding to A = 1, as is other- 
wise obvious since both matrices are doubly stochastic. 


‘Then E 
ШӘ xc uc 
0 1 0 0 
TT = 
Len MENS | 
eo Or D 


This has the characteristic equation 
ҖА—1)# (A—1) = 0 

80 that À = 1 is a double root with two right characteristic vectors which can be taken 
as ($, 0, $, 4) and (0, 1, 0, 0). Then (7,7,)"p, will have a limiting behaviour which 
depends on py. 

| Tt would, therefore, be of great interest to investigate what conditions 
_ imposed on either the order of the operations, or on the matrices themselves, will 
result in convergence to a unique vector independent of the initial conditions. In 
particular, this would have useful application to the case where the states are tho n! 


permutations of и objects and the operations are probability mixtures of the 
permutation operations. 


This work was partly carried out whilst the author was at Princeton Universit y 
under contract with the Office of Naval Research. 
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MISCELLANEOUS 


APPARENT ANOMALIES AND IRREGULARITIES IN MAXIMUM 
LIKELIHOOD ESTIMATION* 
(with discussion) 


By €. RADHAKRISHNA RAO 
Indian Statistical Institute, Calcutta. 


1. INTRODUCTION 
Maximum likelihood (m.l.) estimation is criticised mainly on the following grounds: 
(i) It does not always provide consistent estimates. 


(ii) There exist estimates with lower asymptotic variance than that of the m.l. esti- 
mate, and therefore the m.l, method does not lead to most efficient estimates as claimed in 
the literature on this subject. 


(iii) The computations involved in determining m.l. estimates are in most cases 
unduly heavy. On the other hand, there exist simpler methods of estimation which provide . 
estimates which are asymptotically as efficient as m.l. estimates. 


(iv) No adequate justification has been put forward for m.l. estimation in finite 
samples. Judged by the criterion of mean squared error in finite samples, there are examples 
where certain other procedures are better than m.l. : 


The criticism (iii) on grounds of computational difficulties will be relatively un- 
important when high speed electronic computers become easily available for use by research 
workers. The computations, involving an iterative procedure for solving m.l. equations 
and inversion of matrices for obtaining standard errors, can be easily programmed on any 
modern electronie computer. Recently, routine programmes have been constructed at the 
Indian Statistical Institute for obtaining m.l. estimates of gene frequencies, standard errors, 
expected frequencies and goodness of fit x°, from observed phenotypic frequencies of various 
blood group systems such as OAB, MN, CDE, etc. The time taken for these computations 
is of the order of a minute for each blood group system, even on a comparatively slow machine 
like the HEC (Hollerith Electronic Computer). 

I shall, therefore, confine my comments to the other points of criticism relating 
to consistency, efficiency; and properties of estimates in small samples. 


р 


2. PURPOSE OF ESTIMATION 


Tt will help in our discussion if we agree on the purpose of estimation, on which will 
depend the criteria for the choice of a suitable method of estimation. Much of the controversy 
in the literature on estimation could be dismissed once this problem is properly answered. 
There has been a tendency to consider estimation as a part of decision theory, which requires 
as à datum of the problem the specification of the loss for a given difference between the esti- 
mate and the true value of the unknown parameter. The criterion in such a case is naturally 
the minimisation of expected loss. This may be appropriate in certain situations but I 


* This paper originally presented at tho 32nd Session of the International Statistical Instituto is 
being reprinted hero with the kind permission of the Editor, Bulletine of the International Statistical 


Institute, 
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am not sure whether one can support Berkson (1955) when he wants to estimate the slope of 
the probit regression line in a bio-assay using the criterion of minimum expected squared 
error, unless of course he believes or makes us believe that the loss to society is proportional 
to the square of the error in his estimate. І suppose a bio-assayer, when he obtains an esti- 
mate of the standard deviation of a tolerance distribution, or the LD 50, uses it in a variety 
of ways besides playing a game with nature or with society.! He would like to compare it 


with an estimate of LD 50 for another insecticide, combine-it with à previous estimate for the 
same insecticide to obtain a better estimate, preserve it for comparison or combination with 
future estimates, or indulge in some assertions (with some confidence) that LD 50 is less than 
a specified value or lies between two specified values and so on, or use the estimate itself 


more conveniently in the place of basic data in reaching optimum decisions for a «pecitied 
loss function, 


It may be argued that all these-problems could be answered directly, and in theory 
more satisfactorily, from given data without considering the intermediate methodological 
problem of estimation. If then we insist on estimating the unknown parameters and use 
the estimates for purposes of inference such as those indicated above it can only be due to some 
convenience in handling the estimates rather than the original data, in addition to the resulting 


economy in recording only the estimates for future use, instead of preserving the entire miss 
of observed data, much of which may be irrelevant. If, therefore, we define the purpose of 
estimation as condensation of data, what criteria can we lay down for choosing a method of 
estimation ? 


Most statisticians would probably agree that statistical inference consists, in gencral, 
in discriminating between alternative possible situations on the basis of given data, and as 
such it should be based on the likelihood P(S, 0) of the parameter 0 given the sample 5, 
which is same as the probability (or probability density) of S given 0. More precisely, we 
need the ratio of the likelihoods for two given values 0, and 0, of the parameter. There may 
be, however, some controversy about the form in which the uncertainty in the choice of 0, 
от 0, given S, is to be expressed. 


If there exists a statistic 7’ such that 
P(S, 0,)/P(S, 0,) = P(T, 0,)/P(L, 0,) бол) 


= for all admissible 0, and 0, nothing is lost by replacing the sample S by the statistic 7’, which 
is for all relevant purposes equivalent to 5. Such a statistic 7 is said to be sufficient in the 
sense of Fisher (1922). There will be a multiplicity of statistics 7 satisfying (2.1), one of 
them (in the extended sense of the term Statistic) being the sample itself. In general we 
an choose one among them, say 7,, which is minimal in the sense that T, is essentially a 
function of every sufficient statistic 7 (Lehmann and Scheffé, 1950). A minimal sufficient 
statistic thus provides an exhaustive summary of the sample for purposes of statistical in- 
кро, If %, ..., 2, is a sample of observations, the observed mean and variance s? are 
jointly sufficient when the population distribution is normal with unknown mean and variance. 


1A simple example given by Silverstone (1957) illustrates the 


binomial distribution with probability 9, point. In the case of the ordinary 


value of 0. 
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On the other hand, the minimal sufficient statistie may be the ordered values in the whole 
sample as in the case of a Cauchy population with an unknown location parameter, i.e. no 
reduction of the data is possible without disturbing the relation (2.1). In such a case we 
may look for a statistic which belongs to a specified simple class and provides the maximum 
possible discrimination. 


For a statistic 7 belonging to a specified class let us denote the likelihood ratio by 
P(T,0,)]P(T, 0,). The larger the deviation of this ratio from 1, the greater will be the discrimi- 
nation between the parameters 0, and 0,. For purposes of comparison, it is convenient to 
have a measure of the amount of discrimination provided by a statistic T. "There is, indeed, 
some amount of arbitrariness in the choice of such а measure. Some natural measures are, 
‘the amount of overlap’ between distributions corresponding to 0, and 0; as defined by the 
author (Rao, 1948), or the quantity 


ay P(T, 61) } { ЕТ, 0.) р 
720, 08 = B log { оО tos [oen | 2. @2) 
considered by Kullback and Liebler (1951) following the concepts of information theory. One 
may prefer even a pure distance measure like 
р, 0) = ГУР, б) PUL, 8.) d o. (28) 
introduced by Hellinger (1909) (see also Bhattacharya, 1946). Each of these measures is — 
not more than the corresponding expression when 7 is replaced by the whole sample. The 
ratio of the amount of discrimination provided by T to that contained in the whole sample 
may be considered as an index of the effectiveness of T. When 7 is sufficient this ratio is 
unity for all these measures. For simplicity, let us consider Jp (01, 0.) in the further 
discussion, observing that the same or similar results will be valid for the other measures 
mentioned. E 
When the sample 5 consists of n independent observations on a variate X we have 


Ts (Ox б) = n{ E log (P(X, ВОС, О-В YogLP QT, ВАР, Qu во 


where the expression within the brackets is the value of J(0,, 02) for a single observation and ` 
is therefore independent of т. Ав n—»o, J, (03, 02) 00, and we have perfect diserimination 
between 0, and. [Ж as was shown by Basu (1954). А rigorous demonstration of this result 
was given earlier by Kakutani (1948) using the fact that pg(0;, 9,90 as noo. He showed 
that the distributions of the sample sequences in the infinite dimensional space for two dif- 
ferent values of 0 are ‘orthogonal’, A statistic 7, which replaces a sample would not be of 
much use if it did not provide complete discrimination between any two values of 0 аз 
noo, ie., if Р(Г,, 0) and Р(Т„, 0з) are not orthogonal in the limit. This is possible if? 
Т»—>$(0) with probability 1 as 2—00, where $(0)is a function of 0, having one-to-one 
correspondence with the possible values of 0. This is exactly what the criterion of 
consistency? laid down by Fisher (1922) demands. 


2 Generally, orthogonality is possible only if Тл tends to a particular value but examples may be 
found where for each 0, Tn has а non-degenerate limiting distribution, with distributions corresponding 
to different values of ө being non-overlapping. 

з We are not demanding that Tn—>0. It is enough, if for any two different values of 0, Tn tends 
to two different constants. In such a case Tn is defined to be consistent for 0 in the wide sense (see section 


4 of this paper). 2 
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For given n, Js(0,, 6,) and J7(0,, 0,) depend on how widely separated are the distributions 
corresponding to 0, and 0,. Therefore, the ratio of Jp(0,, 0) to Js(0,, 0,) may not represent 
the true effect of replacing 8 by 7 if the distributions corresponding to 0, and 0, are 
widely different. We may, therefore, consider the ratio of these quantities as 0,57, 
assuming that this implies closeness of distributions, It is easy to see that 


T (Or, 0-00.) 5 $(0,)(00,)* 


Ty(0,; 0300) ~ 3 їт(0,)(80,)% ew (2.5) 


where (0) = Eo, [P'(x, 0,)/P(x, 0,)]* is the information per observation as defined by Fisher 
(1922, 1925) and i5(0,) the corresponding information per observation in the statistic 7. It 
is shown by Fisher (1925) that i,(0,) < 1(0,), which suggests the criterion of maximising the 
information per observation in the choice of a statistic. 


Reference may also be made to earlier work by the author (Rao, 1945) where the 
distance between two distributions differing by small quantities in the parameters is derived, 
by an argument similar to that used here, as a quadratic differential metric of which (2.5) 
is а special case. In the general case #(0,) and ij5(0,) are matrices, and it is known that 
41001) —11(0,)} is a positive semi-definite matrix. The efficiency of a statistic may be measured 
by some expression reflecting the deviations from zero of the elements in the mairix 
11(0,) —i5(0,)). We shall consider only the single parameter case in further discussions. 


The information function seldom provides a complete ordering of the statistics for all 
values of 0 in the admissible range. Tt is, of course, possible to obtain a complete ordering 
with respect to the average amount of information based on an @ priori distribution of 0, 
if this last distribution can be specified. In other situations no satisfactory solution seciis 

_ to exist, although information can be used to eliminate some statitstics which are worse than 
others in à range of the parameters in which we are interested. Fortunately under favourable 
circumstances, there exist methods of estimation for which ip()—1(0) as поо, so that we 
have an assurance that at least in large samples the relative information lost is small. 


In small samples, we could examine the performance of any statistic by computing 
the ratio ip/i. If this quantity is small, we need not insist on replacing the observations S 
by the statistic T', but strengthen 7 by considering other statistics in addition to T, so that 
all taken together provide information per observation comparable to i. In the worst case, 
when the sample size is small, it may be necessary to retain the entire sample or the likelihood 
function either in the form of a graph or tabulated for some values of the parameters, which ' 
would enable us to reconstruct the function without much error, if needed in future. 


р 3. ErFICIENOY - 
3.1. A new formulation of the concept of efficiency. 


Having discussed ceu broad principles for summarising data we may examine 
some easily recognisable properties of statistics by which we can judge their effectiveness and 
diseuss methods by which such statistics are obtained. 


76 


APPARENT ANOMALIES AND IRREGULARITIES IN M. 1. ESTIMATION 


Let us consider the consequences of replacing the observations by a statistic in discri- 
minating an alternative value of a parameter close to a specified value 0. It was shown 
by Rao and Рой (1946) by an application of the important lemma of Neyman and Pearson 
that a test which discriminates best small departures from a given value of 0, for any given 
т, is of the form : “Reject if and only if Zn > A" where 


Za = (ут) i 2, zi = Р/'(а, б)[Р(аң, 0) ... (3.41) 


A is a constant, P(x, 0) is the probability density of x, and жу, ..., х„ are n independent obser- 
vations, Or, if we denote by L the likelihood of the parameter given the sample, the statistic 
Zn is simply (d log L/d0)| /n. 


Can we construct a statistic independent of 0 and with a performance! as good as that 
of Zn? This is possible when there exists à function 7’, of the observations such that 


Zn = М@) т») (0) .. (342) 


or, more generally, when the variance of Zn given T, is zero, a situation in which Ти is suffi- 
cient for 0. On the other hand, it may be possible to construct a statistic such that its asymp- 
totic correlation with Z, is unity as noo. Such a statistic, if it exists? is as good as Zn in 
sufficiently large samples, i.e., is best for discrimination between two neighbouring values 
of the parameter in sufficiently large samples. Based on these considerations we give anew 
formation of the concept of efficiency. 


Definitions. A statistic is said to be efficient if its asymptotic correlation with 
the derivative of log likelihood is unity. The efficiency of any statistic may be measured 
by p?, where р is its asymptotic correlation with Zn. 


In the case of more than one unknown parameter, a statistic consistent for a 
parameter is said to he efficient if its multiple correlation with the derivatives of the log 
likelihood with respect to the unknown parameters is unity. The efficiency of any statistic 
is measured by the square of the multiple correlation. 


3.2. ‘Super efficient? estimates and their efficiency. 

An efficient statistic is defined by Fisher (1922) as one whose asymptotic variance 
is [n i(0)]-, or alternatively as one whose asymptotic variance is the least. Although Fisher 
formally stated the criterion of efficiency in terms of least asymptotic variance it is clear from 
his writings that by an efficient estimate he meant a statistic for which the loss of information 
per observation tends to zero. Fisher gives the following extended definition of efficiency 
on page 714 of his 1925 paper : "The efficiency of a statistic is the ratio of the intrinsic accu- 
racy of its random sampling distribution to the amount of information in the data from which 
it has been derived.’ Не argued'that since the reciprocal of information-for the mean of the 


4 The power of the test based on Zn, when n is large and the alternative to @ is 0--80, is nearly 
n of the argument. The quantity (0) which appears in the 


¢[n 2(0)]d0 where Ф is an increasing functio: 
distributions close to one another is also explicitly involved 


expression (2.5) for the distance between 


in the power function, 
5 For instance the unique consistent root Т}, of the m.l. equation (ref. Huzurbazar, 1948) under 


the conditions given by Doob (1934) or Cramer (1946) satisfies that property, for| J/n(Tn—0) — 22| 0 
with probability l. Ап m.l, estimate when referred to in the sequel is assumed to have this property. 
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distribution is variance when the distribution is normal and the information in a statistic 
is bounded above by n #0), an efficient statistic is recognised when it has a limiting normal 
distribution with variance [n i(0)]-! which is the least possible, In 1951, J. L. Hodges® 
and later Le Cam (1953) constructed examples of consistent estimates with an asymptotic 
variance < [n i(0)]-, with strict inequality for certain values of 0. In fact, at these excep- 
tional points the asymptotic variance can be made arbitrarily small, These examples of what 
are called ‘super efficient’ estimates show that there is no non-zero bound to the asymptotic 


variance of a consistent estimate, contrary to what is stated by Fisher. One might think 
that super efficient estimates with asymptotic variance < [n i(0)]-} should be preferred to 
efficient. estimates with asymptotic variance [n i(0)]-?. We shall examine these notions in 


the light of the new definition of efficiency given here. 


First it may be noted that super efficiency arises, when the statistic is not an explicit 


function of the sample distribution function and therefore not satisfying the consistency 
condition as originally defined by Fisher." Assuming Fisher consistency (FC) and certain 
regularity conditions (mainly Frechet differentiability) on the statistic, Kallianpur and Rao 
(1955) demonstrated that [n #(0)]-1 is, indeed, a lower bound to the asymptotic variance, 
thus justifying Fisher’s argument. It is also deducible from the results of Kallianpur and 
Rao that a FC statistic with asymptotic variance [n i()]-!, under the regularity conditions 
assumed, has asymptotic correlation unity with Z,. This demonstrates the equivalence of 
Fisher’s definition of efficiency with that proposed here under the regularity conditions 
imposed on the estimate. Earlier work by Neyman (1949) and Barankin and Gurland (1050) 


also tend to confirm Fisher's results, 


Now let us see how the new definition of efficiency enables us to judge the effectiveness 
of any statistic, whether it satisfies regularity conditions or not. What happens when FC 
and other regularity conditions imposed on the statistic are not satisfied ? In this case 
'super efficient estimates do exist as shown by Hodges and Le Cam. We shall show that 
when a super efficient estimate (i.e. with a possibly smaller asymptotic variance than that of 
the m.l. estimate) exists, one of the following two possibilities holds. 

a et PC 


$ The example by Hodges is quoted in а paper by Le Cam (1953). Consider the mean Xn of n 


тарачы observations on X from a normal distribution with mean 0 and standard deviation unity. 
As is well known Ху is the m.l"estimate of the mean with variance 92 = ljn. Let Tn be tho function 
‘defined ‘by 


тих») = Xn Хы 1, 


1 
ni 


= aXn if |Xy|« 


Tt d ie seo that Tn is Rer asymptotically normally distributed about 9, with variance = 1/n for 0 52 0, 
EAS Е este Since « is arbitrary, the asymptotic variance is less than that of the m.l. estimate 
= is example of Hodges was generalized. by Le Cam to improve the asymptotic variance 
at a countable number of values of the parameter 6, 3 
т В В Жү, " 
ср ү ae ee distribution function and Р(@) the (rue distribution function a functional 
т!) I isher consistent. (FC) for o if fLF(0)] == 0. For а discussion on this subject sce 
Kallianpur and Rao (1955). Тһе estimates of Hodges and Le Cam are not FC. 
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(1) Inlarge samples, it is equivalent to the m.l. estimate (which is efficient in tho 


new sense), i.e., has asymptotic correlation unity with the m.l. estimate, and therefore it is 
efficient in the new sense, 


(2) It is not efficient in the new sense, in which case it is definitely worse than the 
m.l. estimate, for purposes of inference such as testing of hypothesis, interval estimation, 
ete, 

Let us assume that a statistic Ta consistent for 0 is such that 4/n(T, —0).and Zn 
have a joint asymptotic distribution, Denote the asymptotic variance of 4/n(Z',—0) by 
v(T) and its asymptotic covariance with Z, by a(0). Since the asymptotic variance of Zn 
is (0) we have the obvious inequality 

; (Т) > a?(0)/i(0), .. (3.21) 
From this relation, it follows that T, has asymptotic correlation unity with Z,, or it is fully 
efficient (in the new sense) if and only if v(T) = 02(0)/40). If T} is an m.l. estimate for which 
the observation made in footnote (5) is true, then 4/n(7; —0) has asymptotic variance equal 
to 1/60), and asymptotic correlation unity with Zn. Therefore, when the equality in (3.21) 
is attained, T' and 7* have asymptotic correlation unity whatever may be the inequality satis- 
fied between their asymptotic variances?, a(0)/i(0) and 1/40). If a? (0) < 1 for all 0, with strict 
inequality for some 0, we have an example.of super efficiency as in the case of Hodges’ 
example. In fact, we can use the device of Hodges to construct examples of ‘sub efficiency’ 
ie. where 02(0) > 1. In either case, when the equality in (3.21) is attained, 7’, is equivalent 
to the m.l. estimate 7% in the sense that essentially the same type of inference is possible by 
using Г» or 7% in large samples, whether 7’, is super or sub efficient in the earlier sense, 

We may now ask what happens when the equality in (3.21) is not attained, Such 
a statistic 7, has asymptotic correlation —1 < p <1 with Z,, and therefore is not as 
good as Z, (and therefore not as good as m.l.) for local discrimination, although 7, may be 
super efficient, i.e. 


ap ul) > 

Consider for example a sample 25, ..., х, from à normal distribution with an unknown 

mean и and variance unity and denote by # and m, the sample mean and median 
respectively. Define the statistic. 

Т = Cin if g «n-^ 

= if g > n we (8.28) 

It is easy to see that the asymptotic distribution of T ів normal with variance «?7/2 when 

д = 0, and 1 when д 40, By choosing « arbitrarily small, o?7/2 can be made less than 1. 

The statistic 7’ is therefore super efficient. But for testing the hypothesis д = 0, it is obvious 

that the test criterion is essentially the median when the null hypothesis is true and conse- 

quently the power of the test is smaller than that of 7. In this connection we may also refer 


250) 


POS (3.22) 


в We may compare this result with that of Fisher (1925), that the asymptotie correlation 
between two efficient estimates having the same least asymptotie variance is unity. 

э It may be observed from the example given in footnote (6) that Тл and Xn have asymptotic 
covariance о for 9 = 0 and 1 for 052 0. ‘The value of а, сап be chosen to be >1 or < 1 arbitrarily. 
The technique of Hodges and Le Cam provides a statistic which is essentially equivalent to the statistic 


with which they start. 
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to an interesting but a different type of example due to Basu (1956), where the ratio of a limit- 
ing variance of one statistic to that of another — co but the corresponding ratio of the pro- 
babilities of concentration wthin any given limits of the true value —0. So it would appear 
that the criterion of minimum asymptotic variance is misleading. 


We may now raise the problem whether given a super efficient estimate, it is possible 
to find a funetion of the m.l. estimate which is consistent for the parameter and which has 
a smaller asymptotic variance than the given super efficient estimate, This means that a 
super efficient estimate can be uniformly improved from the point of view of asymptotic 


variance by using a function of the m.l. estimate. This is true of the known examples of super 
efficiency. Further, given a super efficient estimate, i.e., when v(7’) satisfies (3.22), wc can 
construct a function of m.l. estimate, by using Le Cam’s technique, such that its asymptotic 
variance is smaller than v(T) or even a*(0)/i at а countable set of values of 0. To examine 
whether improvement is possible for all values of 0, we have to study the function a(/). Under 
some assumptions Le Cam (1953) proved that |2(0)| can be less than unity only for a set of 
points of Lebesgue measure zero. This is encouraging but does not solve the problem posed 
here. We may have to explore the asymptotic sufficiency of the m.l, estimate (Wald, 1943; 
Le Cam, 1953) to prove this property. 


3.3. Information in the limit. 


We shall examine the limiting information contained in an efficient estimate, i.c., 
one which has asymptotic correlation unity with the first derivative of the log likelihood. 


Let Г» be such an estimate whether it is super or sub efficient with respect to the asymp! otic 
variance. Suppose further that 
(v/n(T,—0), Z,) (1, Z) in distribution К (3.30 
where (2, Z) is bivariate normal with mean zero and covariance matrix 
№ if ey] .. (3.32) 
отм i 


х 


Then the variable |r- иа z] has zero variance, Therefore 
` $ 


[von — s Za) — 0 in probability. ... (3.33) 
HR 


For a statistic T, which satisfies the condition (3.33), under some regularity conditions on 


e 0), the probability (or density) of a single observation, Doob (1936) has demonstrated!” 
а E 


En tin (A)} = 40) ` s. (3.34) 


where ij (0) is the information per observation contained in the statistic T, computed in 


es tom A simple proof of Doob's proposition is given in a recent paper by the author 
ao, 1960). ; 


| ` 10 Doob (1930) states tho required condition in terms of strong convergence. I believe this is not 
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We have no such assurance about the limiting information in the case of estimates 
not efficient in the new sense. In fact, if the asymptotic correlation between 7’, and Zp is 
p and that between T, and P'(Tn, 0)/P(Z'n, 0) is nearly unity (as may be expected) we have 
the relation 


Я а 
Mim. Gp) ph ... (3.35) 


emphasizing the importance of р? as a measure of efficiency mentioned in the definition of 
Section 3.1. The use of T, entails a loss of information equal to that contained in a fraction 
(1—p?) of the observations. 


3.4. Efficiency in non-regular cases. 


In non-regular cases such as the rectangular distribution over the range (0, 0), i.e., 
where the probability measures corresponding to different values of the parameters are not 
equivalent, the quantity 100) is not properly defined so that the foregoing theory is not appli- 
cable, We shall not discuss such situations in full generality but only consider a special 
example given by Basu (1952), where the maximum likelihood estimate has a uniformly larger 
variance than an alternative estimate proposed by him. 


Let 2,,..., ®һ be т observations from a rectangular distribution in the range (0, 20), 
where 0 <@ < оо. The maximum y and the minimum z of the observations are jointly 
sufficient for Ө and the m.l. estimate of 0 is T, = y/2. The asymptotic variance of T, is 1 Ján? 
while that of 7’, = (2y+2)/5, which is also consistent for 9, is 1/5n®, Judged by the criterion 
of ratio of asymptotic variances the m.l. estimate has only 80% efficiency compared to the 
alternative estimate. One might be tempted to infer that discrimination based on Г, is 
therefore better than that based on 7’,, the m.l. estimate, for small differences in the parameter. 
A computation of the power functions of the tests based on T, and T, for any sample size 
shows however that for alternatives close to a given value of 0 the power of 7’; is much higher 
than that of Т, although 7, has smaller asymptotic variance than 7}. Оп the other hand, 
for alternatives not close to a given value of 0, Т„ is better than T,. 


3.5. Concentration. 

‘As is stated above, in the absence of regularity conditions on an estimate ТГ», the 
asymptotic or actual variance of T, does not necessarily give a good indication of the 
concentraton of Ту about the true value of 0, An approach to estimation which is concerned 
trations has been given recently by Bahadur (1960). This 


explicitly with comparing concen 
approach may be outlined as follows. Let Т„ be a consistent and asymptotically normal 
estimate of 0 based on m independent and identically distributed observations. For any 


n and any e > 0, let 7 = т(Ть, в, 0) be defined by the equation 
Püm,-0|2d0-2[-—e«e d „єє (3:51) 
: ! Vi 


т is called the ‘effective standard deviation’ of Т, when 0 obtains. It is shown by Bahadur, 
under mild regularity conditions on the sample space of single observation that 


inde ee ... (8.52) 
Beg ET 10) 
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provided only 7, is consistent. Не also shows, under stronger regularity conditions, that the 
equality holds in (3.52) when 7’, is the m.l. estimate of Ø. The appearance of Fisher's measure 
of information in this analysis provides further evidence that this measure is of central im- 
portance to estimation. 


3.6. Concluding remarks on efficiency. 


We observe that Ё„, for each n, has the maximum local power of discrimination 
between two neighbouring values (Rao and Poti, 1946) and demand the existence of a statistic 
T, independent of 0 and having asymptotic correlation unity with Z,. This ensures that 
Tn has the same local properties аз Za. „Further it is shown by Wald (1942) that asymptotically 
shortest confidence intervals can be obtained by inverting regions of the type Z,(0) > .1,(0), 
2100) < B,(0) (one sided regions) and | Z,(9)| > Cn(0) (two sided regions). It is clear that any 
statistic having asymptotic correlation unity with Z, has the same property in large samples. 

It is immaterial what the asymptotic variance of the statistic is provided its asympto- 
tic correlation with Z, is unity. It may be super or sub efficient in the sense of having smaller 
or higher asymptotic variance than [n (6). If we are placing emphasis on the asymptotic 


correlation with Z, being unity we can achieve this by restricting the class of statistics to 
well-behaved functions of observations. This is for convenience in drawing inferences on 
0 given the statistic. The m.l. estimate, under some conditions, satisfies our requirements. 

Le Cam (1953) suggests asymptotie variance as a measure of concentration of the 
statistic round the true value in large samples. It may be argued that our interest does 
not lie in such a measure of concentration. But it is obsrved that even with respect to such 


& measure, 50 far as the existing illustrations suggest, a function of the m.l. estimate serves 
the purpose, 


4. CONSISTENCY 


А number of quite different examples of inconsistencey of m.l. estimates are now 
Available (Neyman and Scott, 1948; Basu, 1955; Kraft and Le Cam, 19560; Kiefer and 
Wolfowitz, 1956; Bahadur, 1958). The examples have been useful in leading to a proper 
understanding of the concept of consistency. 


Let us consider the concept of consistency as originally introduced by Fisher (1922). 
-We have already referred to it as Fisher consistency (FC) to distinguish it from probability 
consistency (PC) which figures prominently in statistical literature (ref. Kallianpur and Rao, 
1955). A statistic is said to be FC for a parameter 0 if 

(1) it is an explicit function of the sample distribution function S, (or the observed 
proportions [p,, ..., py] in the case of a multinomial), and 


Ж. (2) the value of the function reduces to 0 identically when S, is replaced by the true 
distribution function F(0), (or the tru I 


be considered, whereas in the case of 
be arbitrary for any finite sample siz 
PC may be somewhat dangerous in 
samples of a finite size, 
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і It is easy to show that an m.l. estimate is ЕС without any restrictions whatsoever. 
For if we consider the multinomial situation the log likelihood 


Pı log m,(0)--...-- px log п,(0) 


when p; = л{(ф), is а maximum for z;(0) = л(ф) which implies, when there is one-to-one 
correspondence between z;(0) and 0, that 0 = ф. In the continuous case the log likelihood 


Б | log p(x, 0) dS, 


has a maximum for p(x, ф) = p(x, 0) or 0 = ф when S, the sample distribution function is 
replaced by Ё(ф). 


What can we say about the m.l, estimate when Sẹ or (ру, ..., px) is close to the true 
distribution function F(¢) or [m ($), ...,7,(9)]? We may demand that the distribution in 


the admissible set maximising the likelihood, F(9) or [п, (Ô), ..., ту (0)], if it exists, should be 
close to the true distribution. "This is true under no condition whatsoever on the admissible 
class of distributions in the case of a finite multinomial (Hotelling, 1930; Rao, 1957), under 
the sole condition Ez; log 7; is covergent in the case of the infinite multinomial (Kiefer and 
Wolfowitz, 1956; Rao, 1958), and under slightly more restrictive conditions in the case of 
continuous distributions (Wald; 1949; Kraft, 1955). Examples of inconsistency of the esti- 
mated distribution functions due to Bahadur (1958), in the cases of an infinite multinomial 
distribution and a continuous distribution function, show that they are of a very special 
character, and it appears that it should be possible to prove convergence of the m.l. estimate 
of the distribution function under fairly weak conditions. 


The situation thus appears to be extremely satisfactory so far as the estimated distri- 
bution function is concerned. The corresponding convergence in the estimated parameter 
then takes place when a continuity condition is satisfied, i.e., (0) Рф) (or m(0)>7()) 
implies that 04. It may be noted that a parameter is, after all, a code number used to 
identify a distribution and as such it can be arbitrary and need not satisfy any condition. 
Tn the examples of inconsistency of m.l. estimates given by Basu (1955)! and Kraft and Le 
Cam (1956) the continuity condition is not satisfied and the examples depend, in & sense, 
on an unnatural choice of the parameter. 


The anomaly regarding inconsistency of the m.l. estimate of a parameter can be 
resolved to some extent if we consider consistency in a broader sense as mentioned in Section 
2 of this paper. It was observed that if for any two given values of the parameter the dis- 
tributions of the observations tend to be orthogonal as the sample size—oo, it is reasonable 
to demand that the distributions of the estimate also behave in the same way. When this 
is so we may say the estimate is consistent for the parameter in the wide sense. Such wider 
consistency is ensured when the estimate tends to two different constants for two different 


11 Basu (1955) gave the example of a binomial distribution where the probability of success p(0) 
is defined as follows : 
p(0) = 6 if 0 is rational 
—1—9 if біз algebraic irrational 
The m.l. estimate of 0, which is the observed proportion of success, tends to 0 when @ is rational and to 
(1—86) when 6 is algebraic irrational, and is thus not consistent. Basu also shows that there exists another 
estimate which is consistent for 0. The example of Kraft and Le Cam is more complicated. 
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values of the parameter. We need not insist that the constant to which the estimate tends 
should be equal to the true value of the paraméter. In Basu's example, (footnote 11) 
the m.l. estimate tends to 9 when @ is rational and to (1—0) when 0 is algebraic irrational; 
thus the m.l. estimate is consistent in the wide sense. The same is true of the example? 
considered by Neyman and Scott (1948) where the m.l. estimate of о?, the structural para- 
meter, tends to (n—1) o?/n and not to exactly 0°; clearly, the m.l. estimate is consistent in 
our sense. The m.l. estimate is, however, not consistent even in the wide sense in Bahadur's 
examples. 


ps We may also consider a slightly different kind of example due to Н.Е. Daniels 
(quoted in а paper by Kendall and Babington Smith, 1950). Observations (х, yi), ..., (и, Yn) 
are such that 


= Aiten Y= Gib 


where є; is N(0, a°), 9; is N(0, £9), and e; and 9; are independently distributed. Simultancous 
estimation of a;, у, о? and ¢* by the m.l. method leads to the same value for the estimates of 
о? and £?, so that the estimates of these two parameters are clearly inconsistent in any sense, 
This result is perhaps not surprising, for the data themselves do not seem to provide satis- 
factory diserimination between c and £ (or between one pair of values of с, “ and another 
pair) however large may be number of pairs of observations, when nothing is known about 
the behaviour of the incidental parameters a, as io, 


5. CONCLUSION 


Since the main aim of this paper is to consider apparent anomalies in the m.l. method, 
no reference has been made to the superiority of the m.l. method over others, It may be 
claimed that certain other methods also provide estimates which have the same properties 
4s the m.l. estimates in large samples, although they may be subject to similar criticism in 
other respects. "This may be true, but we cannot use asymptotie properties as sole criteria 
for the selection of a technique which has to be applied in finite samples in practice. So 
we have to look for other properties, which hold good for all sample sizes. We may list here 
some properties of this type which support the claims of m.l. estimates. : 


The m.l. method has wide applicability. Тһе m.l, estimate is a function of a minimal 
sufficient statistic, and in special cases is itself a minimal sufficient statistic, a property which 
may be considered desirable (ref. Rao, 1945, 1946, 1948) and which is not shared, in general, 
by other general methods of estimation. Finally, consideration of the likelihood function 
enables us to recognise the minimal sufficient Statistic, and if necessary, to supplement the 
get estimate with other statistics to recover part of the information lost in using the m.l. 
SUAM alone. А more aceurate measure of loss of. information, based on the variance of 
Zn given а estimate T, (Fisher, 1925), the asymptotic value of which is more appropriate 
dr comparing statistics when the sample size is not very large, shows that the loss associated 
with the m.l. estimate is smaller when compared to many other procedures. A detailed 
study of this aspect is undertaken in Rao (1960). с 


1?Neyman and Scott (1948) consider an increasin; 
acess Pee p. independently distributed. The probability law of ?ij is normal with mean o; 
et н ое ri e Poner 9? is called structural and is the same for all observations, while the 
a 2i Е peti Rp ee Series, are called incidental parameters, The maximum likelihood estimate 
of 02 is Z У (xij —%;)2/sn which is consistent for (1—1) o2/n and not for оз. 
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RESUME 

La méthode de maximum de vraisemblance (m.l.) pour l'estimation des paramètres 
inconnus а été censuré pour les raisons suivantes : (i) elle ne donne pas un estimateur con- 
vergent et (ii) il y а des estimateurs plus efficients que ceux de maximum de vraisemblance 
et (iii) l'usage du m.l. entrainet de la computation difficile. Cette dernière critique ne serait 
pas importante si les machines electroniques capable d'accepter les instructions | ompli- 
queés concernant des opérations numériques, se font disponibles chez travailleurs. 

L'on indique ici, d'abord, que le but de l'estimation est la condensation des données 
sans perte de l'information essentielle et puis l'on fournit une certaine justification pour la 


mesure de l'information de Fisher et le critére de maximisation de l'information dans une 
Statistique. Les conceptions de l'efficience et de la consistence ont été reformulées afin ‹ jue 
Yon puisse fournir un critère pour une telle choix de l'estimateur que la perte de l'information 
donnée soit negligible dans les grands échantillons. Un estimateur efficient a été définé comme 
une estimateur qui a une corrélation asymptotique de mesure d'unité avec la derivó du 
logarithme de la vraisemblance. Un estimateur de maximum de vraisemblance sous quel- 
ques conditions, est efficient dans ce sens. 


L'équivalence de cette définition avec celle de Fisher qui constate que la consis- 
tence est l'atteinte de moindre variance asymptotique, est établiée dans quelques conditions 
de régularité sur la Statistique. Mais la définition nouvelle résout la difficulté qui a apparué 
grâce à l'existence des estimateurs super-efficients ayant, peutêtre, une variance asympto- 
ч plus petite que la variance auprés des estimateurs de maximum de vraisemblance. 

Von montre ici, que les estimateurs super-efficients sont équivalents aux estimateurs 
de maximum de vraisemblance (quand ils sont efficients dans ce sens nouveau) ou sont inférieurs 
auprès des estimateurs de maximum de vraisemblance pour servir le but de l'inférence statis- 
tique, (quand ils ne sont pas efficients dans le sens nouveau). 


E On dit qu'un estimateur soit consistent dans le sens plus ample si. ses distributions 
asymptotiques pour deux valeurs différentes du paramétre, soient orthogonales. Plusieurs 
хеш de l'inconsistence des estimateurs de maximum de vraisemblance dans le sens 
‘ordinaire paraissent remplir la condition de la consistence plus étendue. 
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APPARENT ANOMALIES AND IRREGULARITIES IN MAXIMUM 
LIKELIHOOD ESTIMATION 
Рвивтремт: E. J, б. PrrMAN 
1. Apparent anomalies and irregularities in maximum likelihood estimation 

E L'auteur, M. Rao, présente sa communicationt 

Mr. Neyman: 1. Mr. Rao's very interesting paper brings out certain philosophieal questions 
regarding criticisms levelled at maximum likelihood estimation and, in addition, presents an extensive 
history of the problem going back to 1922 when the term Maximum Likelihood Estimato (MLE) was first 

used. The purpose of the present note is to contribute to both subjects: to express my views on 
the philosophy of theoretical statistical research and to push the historical sketch back to 1908 when 
the ideas or certain properties of the MLE seem to have been first expressed. 

2. The philosophical aspect of the problem is connected with the two different points of view on 
statistics, one having to do with intensities of belief and the other behavioristic. To me personally? the 
intensity-of-belief theory of statistics appears dogmatic and, as reflected in the writings of the various 
authors, is reducible to proposals, occasionally quite insistent proposals, to adopt specified formulas as 
measures of intensities of belief which an individual should experience in specified circumstances. One 
such theory, or creed, advises special formulas as a priori probability distributions for unknown parameters 
to be used in eases where the circumstances of the problem do not imply specific a priori distributions or 
oven do not imply that the parameter considered is a random variable. Further advice is to use tho recom- 
mended a priori distribution for substitution in the familiar Bayes’ formula, 

Another modification of essentially the same dogmatic school of thought is based on the premiso 
that the concept of probability is a measure of intensity of belief which is applicable in some cases but 
not in all. For these cases where the probability is not applicable to measure the uncertainty, the pro- ' 
ponents of the relevant school of thought devise new measures of confidence or diffidence and one of thom 
is the mathematical likelihood. In thinking of these and similar attempts at foundations of mathematical 
statistics, I recall the expressive title of two articles of our recently deceased colleague and friend, D. van 
Dantzig : “Statistical Priesthood” I and 11.3 

The alternative point of view on foundations of statistics, the behavioristic or operational point 
of view, stems from some ideas of Laplace and, expressed somewhat more clearly, of Gauss. Leaving aside 
the question of confidence and diffidence, the behavioristic point of view concentrates on those cases where 
the mathematical probability is an idealization of relative frequencies as experienced in the realm of natural 
phenomena. Here, as is most frequently the case, we are confronted with the necessity of a choice among 
a number of possible actions and tho desirability of each action depends upon the value of a parameter 
intervening in the distribution of the observable random variables, Tf the value of this parameter is known 
or assumed known, there is no problem. Statistical problems arise when the relovant parameter is not 
known and the choice of action has to be based on the values of the observable random variables, that is, 
on the value of an estimator of the parameter. The problem of estimation is, then, to devise the estimator. 
This problem splits into a number of detailed problems. One is to establish the properties of all the ай. 
forent estimators, that are available to choose from in a given problem, Another detailed problem is to 
devise the method, just as easy а mothod as possible, of caleulating tho estimator having the properties 
that fit the situation best. 

The properties of an estimator which may be considered desirablo vary from one particular 
problem to the next. Also undoubtedly, they depend on subjective elements: it is quite conceivable 
that two different persons contemplating the same problem will have different preferences for the 
properties that an estimator should have. In some problems and to some individuals, unbiasedness of the 
estimator and the smallness of its variance appear of paramount importance. Here, Gauss’ method of 
least squares is frequenlty the answer. In other cases, unbiasedness and small variance are secondary or 
irrelevant, and some other property appears important. "Thus, for example, in the problem of estimating tho 
degreo of contamination of drinking water, it may appear most important not to underestimate the conta- 
mination and, from the point of view of public health, the most desirable ostimator is certainty not unbiased. 


1 Bull, Inst. Int, Stat., XXXVIII, 4, p. 439. 

2J, Neyman, “Inductive behaviour as a basic concept of philosophy of science.” Rev, Int. Stat. 
Inst., Vol. 25 (1957), pp. 7-22. 

3D, van Dantzig, “Statistical Priesthood” I and П, Statistica Neerlandica, Vol. II (1957) and 
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With reference to Rao’s discussion of criticisms of maximum likelihood estimation, I wish to make 
it clear that my own criticisms are directed not towards the use of MLE but to the insistence that these 
estimators, or indeed any other estimators, be used as a matter of principle. In my opinion, any user of 
statistical methods should have complete freedom of choice and the role of theory in the matter is to elu- 
cidate the properties of the methods that are available. Thus, for example, the famous inequality giving 
the greatest lower bound of the variance of an unbiased estimator, first found by Fréchet and then, indo- 
pendently, by Harald Cramer and Rao, is a very important result. Itis purely behavioristic or operational 
and tells us that, under certain conditions, if we insist on using unbiased estimators then, no matter what 
we do, the variance of the estimator cannot be loss than a calculable limit, As indicated by а slightly more 
general version of the same inequality, there may be a possibility of finding an estimator which has mean 
square error less than the bound of Rao; then this estimator must be biased. With this in mind, the con- 
sumer of statistical theory may perhaps decide to drop the requirement of unbiasednoss. 


For quite some time the possibility of biased estimators with mean square errors less than the bound 
for the variance of an unbiased estimator remained just a theoretical possibility with no live example to 
show that they really exist. Then Mr. Josoph Berkson appeared on the scone and produced a real caso 
of an estimator, obtained by minimizing the classical Karl Pearson X?, which not only has its mean square 
error less than that of MLE but also less than the indicated bound! I submit that this particular result 
of Berkson is of considerable interest and importance. Quite apart from the possibility of estimating 
the parameter with precision, in the sense of mean square error, better than other known estimators, this 
result raises a host of novel theoretical problems: what are the situations in which biased estimators exist 
with their mean square errors less than the Fréchet-Cramér-Rao lower bound for variances of unbiased 
estimators? Can one invent a method, a machinery such as the maximalization of the likelihood or the 
minimalization of the x2, which, at least in вото cases, would lead to such estimators if they exist ? 


Т note that Rao does not particularly like Berkson's result, apparently for the reason that in tho 
particular problem considered, Rao's own interest centers on a parameter different from the one estimated 
by Berkson. Evidently, we consider the question from different points of view. 


3. Turning to the other part of my contribution, concerned with a detail in the history of MLE 
1 find it interesting that, at least on two occasions, the idea of MLE sprang up from the dogmatic intensity- 
of-belief approach to statistics, However, in both casos the original dogmatic approach was followed by 
studies of a distinctly behavioristie or operational character. The two approaches can be roughly sum- 
marized as follows + 

(i) First statement: MLE should be used because this use is implied by such and such principlo. 

(ii) Second statement: The consistent use of MLE will guarantee such and such long range ad- 
vantages. " 

As far as I am aware, the priority in the approach to MLE just described, involving both statements 
(i) and (ii), belongs to Е.Ү. Edgeworth.! The dogmatic intensity-of-belief ideas of Edgeworth, which are 
also noticeable in Laplace, were connected with the arbitrary a priori distributions and the use of Bayes’ 
formula. The fact that this brought Edgeworth to the use of what we now call MLE is occasionally noted 
ш the literature. For example, an appropriate reference is found їп М.О, Kendall's book.5 However, 
it is much less gonorally known and seems to have escaped the attention of Rao, that, after making Sate: 
ments roughly equivalent to (i), Edgeworth proceeded to formulate a conjecture in the spirit of the state- 
ment (ii) above. The passage I have particularly in mind, published in 1908, is printed in the Appendix 
of a very large and involved paper. It so happens that this conjecture of Edgeworth is now known to be 
broadly true but with some exceptions. Also, as roflected in the excellent historical summary given in the 
uet paper by Rao, although 52 years have elapsed since the publication of Edgeworth's conjecture, 
the limits of its validity are still the subject of numerous studies all over the world. In theso ВЫ 
апа огаш of the general lack of awareness of the identity of the author of the conjecture, it appears 
propriate to reproduce here a brief quotation from the Appendix of Edgeworth's paper. р Er 


4 Е. Y. Edgeworth, “Оп the probable 7 
БШ " p error of frequency constants" — J.R.S.S., Vol. 71 (1908), pp. 


5 M. б. Kendall, “Zhe Advanced Theory of Statistics" Vol. II, Griffin, London, 1946. 
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Appendix 
'This Appendix is designed as a receptacle for some mathematical developments which might have inter- 
rupted the course of the preceding arguments. 
I Love's proof of some preceding propositions. 
; A foremost place is due to Love's confirmation of certain propositions above stated by means of 

an independent proof, 

ds The first proposition thus verified is a particular case of a general theorem which may thus be pro- 
visionally restated. Let у = e¥(z), be a frequency-function apt to represent the distribution of statistical 
obsorvations, Let 21, 25, ..., zn be a set of n observations forming a random selection from the indefinitely 
large group of the observations ranging under the frequency curve, Let ф (21, 92,..., ап) be that function 
of the given observations, which affords the most probable value (as determined by inverse probability) of 
the sought point to which the observations relate; a symmetrical function when, as will be here supposed, 
the observations are all of equal weight or worth, Then, if we take (at random) a serios of sots, such as 


av, 182 osy 10, 
201 gly £e, аи» 


тїї mU» e. möni 
and form for each set the corresponding value of d, the series of mean values thus formed say, 19, 3%, ... mó 
will be such that (m and n being large numbers) the mean square of their deviation from the true point, 
вау 2, ViZ., 


aga) (ab аи 
m 


will be less than the mean square of deviation presented by any other sot of mean values 4X, 2X, «mx, each. 
formed from a sot of n observations, where Х (like) is a symmetrical function of observations, having the 
properties of an average. 

Inline with the Victorian style, the above passago is interspersed with footnotes. І take the liborty 
of omitting these. 

In contemplating this passage, one is struck by the change in style, terminology and precision of 
expression which have occurred during the half-century that elapsed since the publication of Edgeworth's 
paper. However, the translation of the passage into modern terms presents little difficulty, 

The function y = ехр(у(2)) is the probability density of an observable random variable, say X. 
Further context suggests that, in addition to z, the particular value of X, the function у depends upon а 
parameter, say 6, and that the actual value of this parameter, say 6o, is unknown, The value бу is des- 
cribed by Edgeworth as “ће sought point to which the observations relate" and, later on, ав "tho true 


point, say а, ....”. Tn order to avoid the use of the samo letter z in several different meanings, I introduco 


the symbol 6o. 
The symbols jj for $ = 1, 2, ...., m andj = 1, 2, ...., п represent independent observations on tho 


variable X arranged іп m samples of n observations each. The function ø described as “that function of 
tho given observations, which affords the most probable value (as determined by inverse probability)" is 
simply the maximum likelihood estimator of 6o. 

Edgoworth's assertion is that, if X is an alternativo estimator of 0o, based on the samo observations 
as @ and subject to some not distinctly stated limitations: "where X (like $) is a symmotrical function 
having the properties of an average," then the asymptotie mean squaro error of à will be less than 
(presumably, not greater than) that of x. 

Edgeworth was not able to prove his conjecture to his own satisfaction and tried to enlist the help 


of Love. Unfortunately, Love's success was limited. However, Edgeworth’s assertion compares favourably 
with those found in a number of recent books on statistics which flatly assort that, as proved by somebody 
or other, the asymptotic variance of MLE is a minimum, without any limitations. In favour of Edgeworth 
is the realization that some sort of restriction on the alternative estimator Х is necessary. 

The ideas of Edgeworth did not seem to have much influence on the thinking of the contemporary 


statisticians, and the above clear cut statement of tho presumed optimal property of MLE wont unnoticed. 
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The idea reappeared in the literature fourteen years later in the famous paper of R.A. Fisher to whom we 
owe a great number of other concepts and terms, consistency, efficiency, sufficiency, etc. Here, again, 
the origin of MLE was in the degree-of-belief approach to statistics but based on principles different 
from those of Edgeworth. Also in this case, the original approach, which appears to me dogmatic, was 
followed by increasingly accurate behavioristic studies. This part of history appears adequately covered 
by Rao, and I need not enter in any details. 

4. Before concluding I would like to request Professor Rao to explain his philosophical standpoint 
a little more clearly than he does in his paper. Some passages in his paper suggest the possibility that, 
in Rao’s opinion, MLE should always be used irrespective of the properties it may have. Doos Rao really 
mean this? The passages I have in mind include Rao’s Introduction and his Conclusions. 

In his Introduction Rao lists four different reasons for which maximum likelihood estimation has 
been mainly criticised. In the Conclusions there are listed the various advantages of MLE. “It looks as 
if the choice of a method of estimation is treated more or less like the choice of an automobile which a family 
will have to use for a number of years. All automobiles on the market are open to some criticisms and some 
of them have certain advantages. The standpoint of Rao seoms to be that tho advantages of the auto 
mobile MLE outweigh the disadvantages. This impression is fortified by Rao's dealing with what һо consi- 
ders as criticisms of MLE. Опо reason listed is that in certain cases MLE have been shown to be incon- 
sistent in the usual sense of the term. This fact is not denied by Rao, Instead, he introduces a distinction 
between PC and ЕС, consistency in the sense of convergence in probability and consisteney in the sense of 
Fisher. Also there are some other interesting connections in which the term consistency is used. It is 
then shown that in some cases where the MLE are inconsistent in one sense, they are consistent in anothor 
sense. 

Another ground for criticisms discussed by Rao is that, in some specified cases, consistent estimators 
of a parameter @ are readily available with mean square errors that aro less than those of MLE. In one such 
case, Rao's stand seems to be that it is pointless to try to estimate @. In my opinion, the difference bo- 
tween selecting an automobile and selecting a method of estimation is that the car is, во to speak, indivi- 
sible. It isimpossible for a purchaser to take some characteristics of a Volkswagen, vory desirable for short 
trips in town, and combine them with certain other characteristics of a Rolls Royce, most desirable for 
extensive travel. If the family is limited to a single car, it must face the necessity of weighing the relativo 
advantages of each make against the disadvantages. No such necessity exists in the choice of a method 
of estimation. Provided one knows the properties of the several methods available for the given problems, 
and provided one is clear as to what one wants to achieve, опе is at liberty to use MLE in certaincases and 
some alternative estimators in others, However, in order to avoid disappointments, it is quite essential 
to know what the properties of the different estimators are. 

From this point of view, the authors whom Rao considers as critics of MLE, are not really critics. 
They just provide us with valuable information, 

Та order to make Rao’s philosophical stand quite clear, I suggest that he gives an unambiguous 
answer to a trivial question which, however, is both specific and illustrative. Suppose that an association 
of manufacturers of certain measuring instruments is anxious to have a formula for estimating the error 
variance 9? of each instrument. Suppose that with each instrument a moderate number n of independent 
measurements are made of a large number N of different objects. With the usual assumptions and with 
the usual notation, the two contemplated estimators are 


ess 5 = nS*/(n—1). 

The first ostimator si is the ML estimator, The second is not. However, the first estimator has 
an operational property which may seem undesirable: it is inconsistent. In fact, as N— oo, the 
first estimator tends in probability not to ¢ but to a smaller number (n—1)/ng? On the other hand, the 
second estimator is consistent and, in fact, unbiased. 

The question із; which of the two estimators would Rao recommend? I hope that Rao's advico 
will be behavioristie, in favour of Sł. If it is not and if he insists on MLE, there may be trouble. In fact, 
there may be a lawsuit for damages. For, if one of the manufacturers, say A, has his » — 10 and another 
manufacturer B has » = 2, the manufacturer A will have а legitimate reason to complain. 
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E Ee is another question. In his paper Rao writes that someone has suggested that tho supor 
efficient estimators constructed by Hodges and Le Cam be used in practice. Would Rao kindly indicate 
who made this suggestion. The point is that, with Rao's statement as it now stands, the reader is likely 
to e the suggestion came from either Hodges or Le Cam or from both, At this I would be most 
surprised. 

The Hodges-Le Cam estimators are the usual count examples showing that certain theorems, 
thought to have been proved rigorously, are in fact, false. In the present case the theorem in question is : 
out of all consistent and asymptotically normal estimators, the MLE has a minimum asymptotic variance 
ED all values of the estimated parameter. Hodges’ example showed that this theorem is false and that, 
for the assertion to be true, it is necessary to consider not all the estimators of the kind described but those 
of some limited class, 


RÉSUMÉ 


Le papier intéressant de M. Rao me suggère quelques réflections d'ordre philosophique et quel- 
ques autres d'ordre historique. Premièrement, il me parait frappant que les estimateurs de “Maximum 
Likelihood” (M.L.E.) semblent être reeommandés par certains auteurs, dont M. Rao, pour des raisons 
de deux espàces différentes : parce qu» dans certains cas ces estimateurs possédent des propriétés dé- 
sirables et parce que leur usage systématique est une affaire de principe, indépendamment des consé- 
qnences. La premiére raison est toute naturelle, mais la deuxiéme me paraît étrange. Voici un exemple. 
Considérons quelques instruments à mesurer. Pour charactériser leur précision, on emploie 
chaque instrument pour faire m mesures indépendantes sur chacun des m objets différents. Soit 


N Gi, о? ) une de ces mesures. Une agence publique, dont le but est de charactériser la précision 


moyenne des instruments produits par différentes fabriques, a besoin d'un estimateur de la variance o? de 


l'erreur de mesure, La formule 


Stak Ey—X 1 
eia ij i) [mn (1) 


ropréssute l'estimateur М. L. Supposons que. deux usines, 4 et B, produisent des instruments identi- 
ques, avec т = 1. Supposons que dans la fabrique А on a m = 2. Alors, comme on le sait bien, lorsque 
= 0.5, D'autre part, si dans la fabrique B on am = 10, alors lim р S? = 0.9. 
Done, dans ез eas, l'application de M.L.E. conduirait à une conclusion fausse que les instruments 
venant de 4 sont beaucoup plus précis que ceux venant de B. D'autre part, il est aisé de définir un 
Ce qui m'intéresse c'est si М. Rao recommandrait l'usage 
de (1), meme dans les conditions indiquées, pour l'uniqu* raison que cet usage est prescrit par le 
principe de М. L. de M. Fisher.—Un détail historique : à ma connaissance, la premiére tentative, de 
formuler un théorème impliquant les propriétés désirables des M. L. E. se trouve dens un travail de 
F. Y. Edgeworth publié en 1908. J "en cite un passage dans mon texte anglais. 
I believe that, Mr. Rao has been most successful in attaining his main purpose 
ing apparent anomalies and irregularities in maximum likelihood by refor- 
very natural and elegant way and also in closer 
It is the merit of the present paper that further 
includ’ng philosophical ones 


n augmente, lim р S? 


ostimateur de т? n'ayant pas cet inconvenient. 


Мк. KITAGAWA : 
in this paper, namely, in resolvi, 
mulating the notions of efficiency and consisteney in а 
connection with the original ideas of Sir Ronald Fisher. 
diseussions can and must be done from any more essential standpoints i 
than those connected merely with mathematical techn!ques. 

Rao has presented a novel concept of estimation, and it is 
the bio-assay case. Recall how the bio-assay problem arises in 
as ordered the administration of, say, 200 units of insulin in a unit 
volume and the pharmacist prepare а solution with that concentration. If his stock solution contains 400 
units per unit volume, he will dilute it to halfitsstrength. If it contains m units he will dilute it in а pro- 
portion of 1/т. He makes a bio-assay to find out what is the value of m. In terms of the decision concept 
of estimation, the decision involved here is the number of cubic centimeters of diluent to add to the stock 
solution, in order to bring it to the strength required by the physician. The measure of the efficacy with 
which the bio-assay is accomplished is some average of the error made in estimating m, and the classic 
measure used, though not the only conceivable one, is the mean square error. Perhaps we will give it loftier 


statistical prestige if we eall it a “Joss function.” 


Mr. Berxson: Professor 
interesting to visualize its operation in 
its medical application, The physician h 
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Mn. Rao suggests that in statisties, estimation is only an incidental BE e 
serious statistical objective is the condensation of the data to poanomcat он so that, for inst E i ГУ 
may be added to similar data obtained in another bio-assay. We can imagine Boing this. ET € we 3 
a bio-assay and record the results in a statistically efficient way, perhaps AE minimal шып, eni E à 
perhaps as the whole likelihood function. When we perform another bio-assay we ME in E : а 
manner, add the efficiently summarized data of that bio-assay to the efficiently summarized data of t i: E 
bio-assay. When we make another bio-assay, we will add the efficiently summarized data to those а1гөв‹ у 
accumulated, and so forth. Now, if instead of making a point estimate in the first instance and diluting 
the solution to the required 200 units in accord with that point estimate, the statistician faithfully pursues 
the avowed purpose of estimation to condense the data and to prepare a whole series of fiducial limits in 
the sense of Fisher, or confidence limits in the sense of Neyman, what will happen ? Well, what will happen 

in the meantime is that the patient will die in diabetic coma! This, of course, is irrelevant to the logical 
development of the fundamentals of statistics, but it is a point. Another point is that the law, in it : 
benightedness, does not allow the killing of a patient with an overdose of inferenco theory an 
ап underdose of insulin. If I am the statistical bio-assayist following Rao's theories—and this 
is conceivable sinco I am a great admirer of Rao—I will be committed to the hoosegow on the 
charge of malpractice. I hope that while I am in durance vile, my friend Rao will visit me. It 
will be a consolation to contemplate with him the ultimate nature of statistics and to realizo that while 
T am suffering on bread and water it is in the noble cause of statistics considered as right thinking 
and correct rational inference, regardless of practical consequences. 


Now a word about the summarization of data. Suppose we accept Fisher's measure of officionoy 
as the proportion of available information (in a certain sense), extracted by a statistic Т[8]% It is 
obvious that this cannot be a measure of the efficacy of 7 as an estimator. Any "random" number or 
even a meaningless symbol 7' that is a one-to-one function of the possible samples will be completely 
efficient (sufficient) in this sense. However, it would hardly do as an estimator, But considering Fisher's 
efficiency only as a measure of effective condensation of data, what is the relation of it to maxi- 
mum likelihood estimation? It should be emphasized that Rao definitely did not say that this 
estimator necessarily extracts as much information as possible. But I have the impression that such 

a claim has been made, though this may be a misunderstanding. It seems to be widely believed for 

instance that where a sufficient statistic other than the sample exists, the maximum likelihood. estimate 

will be sufficient and hence will extract the total amount of information available [8] [9] [10] [11]. 

But this is not strictly true. What seems to be true is that the maximum likelihood estimate will be 

a function of the minimal sufficient statistic, but it will not necessarily be a one-to-one function, and 

therefore it will not necessarily be sufficient. In such cases—and they seem to be of fairly common occur- 

rence, e.g. [5]—the maximum likelihood estimate would not be sufficient even for storing the total ‘‘in- 

formation.” For this purpose, one should have to store at least the sufficient statistics themselves. In 

^ he instance of the logistie function with binomial variation, there are minimal sufficient statistics for 

its parameters о, в. For the case of a "bio-assay" experiment with three equally spaced *'doses" 

2, n= 10 animals exposed at each dose, both parameters to be estimated, a minimum x? estimate which 

І call the “minimum logit x? estimate” is consistent FC as well as consistent PC, and is asymptotically 

efficient. For finite samples it is sufficient, and extracts the total amount of available information. The 

same is true for an infinite number of other controllable experimental arguments, though not in all 

` such arrangements, The maximum likelihood estimate is also consistent FÜ and PC and is asympto- 

tically efficient, but for finite samples it has larger mean square error than the minimum logit x? esti- 

mate, and it is not sufficient, It loses a calculable amount of information, 

ment in which the probability Р, of response at the central dose is 50 

of information lost increases аз the experiment is asymmetrically placed, and approaches unity as Ре 
approaches 1 or zero. I should like to ask Professor Rao whether, with an e: 


xperiment such as described, 
he would still prefer the maximum likelihood estimator to the minimum logit x? estimator. 


which is small for an experi- 
per cent, but the proportion 


"Rao reiterates this definition, but one should note that Fisher did not limit it to asymptoti- 
cally normal estimators, and specifically applied it “to finite sam; 


ples and to other cases where the dis- 
tribution is not normal? The definition is pertinent as a measure of the sufficiency of a statistic, but 


not as the efficiency of an estimator. This distinction is widely recognized. 
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d E Mu pes ху iud apple: to the. oink of simple-mindedness. Statistics is 
to estimate a specified parameter, е Sm $ A eod E ro ee san, of 
Me Я Е е objective, the measure of the relative worth of 

т is the value of some loss function such as the mean square error. In using the mean square 
error, I do not mean to make the world believe that its loss is always proportional to the square 
ой Е. error, pa Rao implies, but take it only as a representative loss function though, of course, 
it is the classic measure of error and the most widely used. It doos seom to be а good working rule 
to b. that the probability of making an error greater than a critical size is probably larger with 
E c 2 ihe SHE variance of the measure, though the conditions in which this is certainly 
EU are limited indeed. I do not know any theorems in which the probability of significant errors 
is smaller, the larger the mean error, but I know some in the opposite sense, for instance the normal 
law as а precise case, and the Tschebysheff rule for ап approximate evaluation. I do not think that 
it is а matter of indifference whether the mean square error of an estimate is large or small, and I 
sco no reason for preferring an estimate with large mean square error. Knowing nothing relevant to the 
contrary about an assay, I should want medicine that I prescribed to be assayed by a method with 
known small mean error. Indeed, I should feel duty bound to insist on it. And if there were a reason 
for my disregarding the small mean error, it could not be because there was another method that better 
condensed the data. I quite disagree with Rao when he defines the purpose of estimation as the 
condensation of data. The object of estimation is to evaluate the parameter with as little error as 
possible, in some acceptable definition of “error.” To define the objective as condensation of data, 
irrespective of error, seems to me not to point up the essential purpose of estimation, but to divert us from 
it. Rao’s apparent predilection for an assay with large average error seems to me unnatural, His present 
belittling of small mean square error is puzzling. I notice that elsewhere he characterizes an unbiased 
ostimate as “best” if it has minimum attainable variance [12].7 


Now this does not mean that the loss function of mean square error is the only conceivable one, 
or that it is necessarily definitive. If, in a particular application, somo other loss function suggests itself, 
let it be investigated. Rao has questioned my use of the mean square error, which is tho loss function of 
Gauss,$ when comparing some minimum X? estimates with the maximum likelihood estimate of the para- 
moters of the logistic function and of the integrated normal function. Tn these investigations it was found 


terminology here, without implying that 


7 Rao has informed me that he was only using accepted 
ally accepted attitude and 


such an estimator is best from a practical view. Even so, the use reflects a gener 
does not support Rao's suggestion that his view is shared by most statisticians. 


з Since Edgeworth and Gauss have been mentioned in this discussion, the following quotation Е 


from Edgeworth [7] referring to Gauss is interesting : 

ranch of mathematical physies deserve to be 
ria Motus Corp. Coel...to justify tho method 
ich....habe fallen lassen) has occurred 


The reflections of the great mathematician on. this b 
transcribed here —“That the metaphysic employed in my "Theo: 
of least squares has been subsequently allowed by me to drop (Dass i 
chiefly for a reason that I have myself not mentioned publiely. The fact is, I cannot but think it in 
every way less important to ascertain that value of an unknown magnitude the probability of which is the 
greatest—which probability is nevertheless infinitely small—rather than that value by employing which 
we render the Expectation of detriment a minimum (an welchen sich haltend man das am wenigsten 
nachteilige Spiel hat), Thus if f(a) represents the probability of the value a being assumed by (für) the 
unknown quantity 2, it is not so important (ist weniger daran gelegen) that f(a) should be a maximum as 
that f. fle) F(z—a)dz, the integral extending over all possible values ofz, should be a minimum ; when 
for F is selected a function that is continually positive and continually increases in a due degree (auf eine 


schichliche Art) with the increase of the variable. That the square is selected for this purpose is "purely 
arbitrary, and is in the nature of the subject that there should be this arbitrariness. (Willkürlichkeit). 
Except for the well-known very great advantages... which the choice of the square secures, one might 
havo chosen any other function satisfying the above conditions.” 4 

ught it pertinent to consider the square error as а measure of the 


Rao Ваз asked me why I tho! 
same as Gauss's. 


efficiency of the estimators which I studied. My reasons are the 
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that some minimum X? estimators are more efficient [1] [2] [3] [4]. But he has presented no parallel 
analysis of these estimates, on the basis of some other loss function, disagreeing with this result. Until 
he does so, judgement of his criticism of my work should be held in abeyance. 

Although at one point Rao criticises the use of the criterion of mean square error, at another point 
he seems to approve it, for he suggests that my "difficulties" are due to having applied it to the estimate of 
the wrong parameters. In my papers I was concerned with estimating parameters а and А (location and 
scale) of the logistic function. I shall explain that some 10 years ago, when I became concerned with tho 


problem of statistical bio-assay, I found that not only were there innumerable articles but there were even 
several books concerned with this problem (or its equivalent in terms of other functions). Itwould seem 
that this itself is sufficient justification for examining the estimate of these parameters. But I may go 


further. Wishing to get data and examples in actual uso, I communicated with several pharmaceutical 
firms, and from one very important house Т received copious data of bio-assays that had been performed 
“by the method of Bliss” (“probits,” with maximum likelihood). In all these assays a value of В was as 
sumed as known from previous experience, and the problem was to estimate « (from which the E.D. 50 
followed directly). This was the origin of my taking as the paradigmatic problem for mathematical statisti- 
cal bio-assay the estimation of а with В known, Rao says I should instead have considered estimating tho 
probabilities of death at various doses. But it is not for the mathematician to say what parameters should 
be estimated. Tt is his function only to say how parameters that are specified for him can best be estimated 
—if he сап! If Rao does not say that the P;’s should be estimated instead of а, в, but that it would bo 
interesting and important to consider the estimate of the Рв also, my answer is: “Yos, and I have thought 
of it, but with my limited and primitive means of computation it was important to do first things first, 
and besides, this particular programme is not so easy to define, much less to carry out, as it may appear.’ 
IfRao desires it, I shall undertake some computations along these lines. T may say in advance, however, 
that (1) whatever the results may turn out to be, they will not mitigate the results already obtained in 
estimating æ, В, which have their own primary importance and (2) I do not anticipate an essential reversal 
of my previous conclusion of the general relative inefficiency of the maximum likelihood estimates compared 
with some minimum X? estimates, in these experiments. 


We possess no principle of estimation the application of which ensures a best estimate in terms of 
the mean square error or any other objective operationally meaningful loss function. For tho case of multi 
nomial variation, I have defined an extended class of minimum x? estimates which provides asymptotically 
efficient estimates [2], and this can frequently, but not always, be interpreted as estimates with approxi- 
mately minimum variance in large samples. The maximum likelihood estimator ean most simply be 
regarded as just one of the estimators in this class of minimum x2 estimators, For finite samples, really 
of any size but euphemistically referred to as small samples, we do not in general know which of these esti- 
mators has smallest mean square error. Certainly there is no reason to believe that the maximum likelihood 
is necessarily the best. The only way to find out is to investigate. Let us not stifle investigation by assum- 
ing that we already know. For some cases, Т have found that what I have called the minimum transform x? 
estimate has smaller mean Square error than either the maximum likelihood estimator or the minimum 
Pearson estimator, and, incidentally, smaller than the lower bound for the variance of an unbiased regular 
estimator, which was widely thought to be impossible. Fora special case with the logistie function I found 
another estimator—the Rao-Blackwellized estimator—which has even smaller mean square error. Mr. 
Joseph Hodges and I[6] will present, at the fortheoming Berkeley Symposium, another estimator, the И 
estimator, for dum same case, which in a certain minimax sense is still better than the Rao-Blackwellized 
ов estate can be developed for Particular cases which have different operationally 

Properties, We do not have to have a monolithie stati, 


: istics. Let investigation flower 
along different paths, and let a thousand estimators bloom! 


APPARENT ANOMALIES AND IRREGULARITIES IN M. L. ESTIMATION ` 
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Sin Вомльр Fister: Mr. Neyman surprised many of us by his claim in his recent memorandum 
that Edgeworth introduced the Method of Maximum Likelihood, Edgeworth in fact bound his method 
on the theory of inverse probability and ascribed his notion specifically to К. Pearson and Filon in 1898; 
the Mothod of Maximum Likelihood may equally be found in this paper, only Pearson and Filon were under 
the misapprehension that the errors of random sampling were tho same as those of tho Method of Moments 


regarded as axiomatie by these authors. 


Edgeworth, however, ends his papor with the reservation that all that he had said referred only 
to Measures of Central Tendency and not to the more complex problem of “The Fluctuation”. 


Mr. Karz lit les observations suivantes soumisos par M.G.A. BARNARD., 


It sooms to the writer that the so-called anomalies in maximum likelihood estimation arises from 
misunderstanding of the problem which the method sets out to solve, The idea has grown up that the 
object of an estimation procedure is to find a single value for a parameter which may in some senso bo 
regarded as “best”, given a set of data, Alternatively, ап interval is required within which tho truo 
value of tho parameter may be supposed to lio. Neither of these formulations corresponds тин the 
requirements of scientific inference. These can, in the first place, be roughly specified as requiring даш 
a single value, to be regarded as “estimate” and indissolubly associated with it, some means of specifying 
tho "error" to which this estimate is liable. 
оой, in its simplest form, answers this requirement by giving as the 
“estimate” the point at which the log likelihood function has its maximum уаш, together with the inverso 
of tho second derivative of this funetion at the maximum, which is used as an eium of the error, This 
procedure may be “justified” in several ways, but perhaps the principal ОЗС сап now be seen to 
consist in tho facis : (1) that the log likelihood function is always minimal sufficiont, so that for problems 


The method of maximum likelih 
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of the type considered we need only aim to speeify this function. (2) The log likelihood function is often 
well approximated in the neighbourhood of its maximum, by a quadratic expression; во that а specification 
of the location of the maximum, together with the second derivative there, gives us a good idea of the general 
course of the function. 


From this point of view it is evident that we may expect “anomalies” to arise when the log likeli- 
hood function is far from being parabolic, and it is trivial that such instances can be constructed, starting 
from non-anomalous cases, by sufficiently pathological transformations of the parameters. More serious 
difficulties may arise when it is the form of the probability (density) function of the observations which 
makes the parabolic approximations poor—as may arise, for example, with certain configurations of small 
samples from the Cauchy distribution. In such cases we should boar in mind the principle of serendipity, 
according to which, if we are lucky enough to have obtained a sample which happens to give a parabolic 
log likelihood function, we need not concern ourselves with the problem of what we should have done had 
we been less lucky. In other casss, where serendipity does not come to our aid, we may either follow the 
suggestion made many years ago by Fisher, of specifying higher derivatives of the log likelihood, or we may 
use the sequence of moments of the likelihood function, rather than the sequence of its Taylor coefficients 
as the basis for our specification. In the case of the Cauchy distribution this would lead us to the Pitman 
estimator, though with an interpretation different from his, since we would think of it as associated with 
an "error" given by the second moment of the likelihood function, rather than as a "point estimate.” . 


The problem of approximating to the specification of the log likelihood funetion, by way of the form 
indicated, is thus seen to have the same limited degree of arbitrariness associated with it as do other problems 
of approximation of functions, 


In certain particular contexts a practical decision problem may be represented as leading to what 
has been called the problem of point estimation, and in such cases the loss function and a Bayesian prior 
distribution require to be specified before a unique solution can be arrived at. "The fact that the data enter 
the solution of this problem through the likelihood function which they generate can be seen as another 
mode of justification of the likelihood approach. Evidently, under suitable regularity conditions, the 
solutions to wide classes of problems of this type could be seen to be estimators which are functions of the 
maximum likelihood estimator, together with the second and perhaps a few higher derivatives of the log 
likelihood function. 


The need for a simplified description of the likelihood function by means of parabolic approximations, 
or otherwise, can be thought of as considerably reduced by the possibility, now existent, of drawing contours 
of constant likelihood, for up to 3 unknown parameters with the help of automatic computers. A speci- 
men of such a contour map (for the programme for with I am indebted to Mr. H. Whitfield of Imperial 
College) for the unknown parameters pi; pz arising from the 2x 2 table is attached. The effect of the skew- 
ness of the likelihood function for pa can be seen quite clearly and the limitations of the paraboloidal approxi- 
mation are apparent. It is also evident that these limitations would be reduced considerably if the logistic 
transformation 


a; = logpif(l—pi) аз = log pa/(1— pz) 


wero applied to tho parameters. 


A not-A Total 
3 1 10. Р(А} = 
1 п 12 РДА) = p: 


All the essential ideas mentioned above seem to the present writer to have been implicit in Fisher’s 
classical papers, and the only exeuse for rostating them here is that subsequent developments have shown 
that these classical papers havo not always been studied with the attention they deserve. 
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MR. Вінхвлом: І would like to congratulate Mr. Rao on his presentation of very intresting 
contributions to the mathematical theory of estimation, and also to thank him for his clear statement of 
his general standpoint concerning the nature and purpose of the estimation problem. His view of estimation 
is а broad one, in which a single point estimate may be used in a varioty of specific ways, some of them 


having the character of decision-making or specific inference problems, and some of them serving the pur- 
pose of efficient recording and interpretation of basic scientific or technical information of more general 
interest. Thus Mr. Rao's view of estimation is a kind of combination of the two standpoints presented 


by Messrs. Neyman and Barnard, and his general problem is that of showing how well a single peint 
estimator can serve these broad and varied functions. 

Tho standpoint which Mr. Neyman stated concisely is ono upon which I based tho first part of my 
own contribution here last week: in the problem of point-estimation, which formally includes confidence 
limit estimation, in general all possible estimators should be considered, and for a specified situation of 
application of choice of one estimator should (at least in principle) be used on comparisons of the probability 
distributions of all estimators. Such comparisons and choices may bo informal or may utilize formal 
criteria, but they should reflect appropriately the situation of application and the statistician’s purposes 
and judgements in the given situation—but the subject-matter of such comparisons and choices is basically 
those properties of estimators, represented by probabilities of errors of many kinds, which admit direct 
frequency interpretations. This standpoint leads in typical problems to large classes of admissible esti- 
mators, often including maximum likelihood estimators among many others. None of these admissible 
estimators ean be eliminated from consideration as a matter of principlo on the grounds mentioned; choices 
can be based only on grounds of specific judgements in specific problems and situations. _ 


Mr. Barnard considers the point-estimation problem itself to be an incomplete and inadequate for: 
mulation of another inference problem, He states that the solution to this other problem is in principle 
tho likelihood funetion itself, and that the role of the maximum likelihood point-estimate Е simply to give 
а partial description of the likelihood function. What is this other problem whose solution is tho likelihood 
function? I would eall it the problem of informative inference, and define it as the problem of reporting 
ofliviontly, in meaningful objective terms, the statistical evidence, provided by an observed experimental 
oute oinas which is relevant to the statistical hypotheses (possible parameter values) QUE Бопай танов. 
Although the term "statistical evidence" is поб in common use in mathematical statistics, I Mr nd Vm 
should be, because it represents accurately an. essontial feature of many important applications o! is- 


tical techniques. What is the nature of statistical evidence, and what are a ig loa кы ы 
quantitative properties? As a familiar example, when an outcome of a scienti о ре! я 
оп the basis of a test having very small probabi- 


rojection of one statistical hypothesis in favour of another, т Е 

litios of both types of errors, what seems most relevant and useful for typ vs анти eget 
of the outcome as strong evidence against the first hypothesis. Te is a Б шы mn techniques 
tests are eustomarily interpreted in this way; one may wonder how often апу; э ch we may call evidential 
would be used in scientific research if they did not admit such Rem fur d D as strong. 
interpretations, of outcomes. "Tho objective basis for interpreting ide Dus id RENE purpose 
ovidence against a hypothesis is tho small magnitudo of its error-prol n oe latter probation admit 
of evidential interpretation of one given outcome of a test, it teeta relative frequencies which corres- 
an objective froquency interpretation in the conceptual tonie, lly, but will not be so realized in connection 
pond to error-probabilities could in principle be realized руне Ун terpretation of these probabilities 
with the given experimental investigation; although img О i 2 outcome as statistical evidence, 
is purely conceptual, it suffices to support the interpretation А 


ture of statistical evidence, in such objective probabilistic 
as illustrated in the second part of my contri- 
titute a mathematical theory of informative 
intensities of belief or subjective pro- 


Tho full analysis of the nature and struc! 
terms turns out to be a well-defined mathematical propier 
bution here last week, Such analysis may be md ю e 
statistical inference, and its subject-matter is quito Pme e likelihood funetion; and this analysis 
babilitios. Such analysis leads to à cortain contral position ae significance inherent in tho likelihood 
unfolds systematically, in objective probabilistic terms, me a СЫР. mative inference should in principle 
function itself, Such analysis gives support to the ад нг 
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be based just on tho likelihood function itself, and in my opinion eliminates the need for some of the other 
kinds of justification of this claim, mentioned by Mr. Barnard, which seem somewhat less direct; on the 
other hand, it seems essential to develop the general theory of informative inference, including completo 
explicit probabilistic interpretations of the statistical evidence provided by experiments of various mathe- 
matical forms. The simplest type of such interpretations is illustrated by the following example: when 
two simple hypotheses are considered, an outcome which gives the likelihood ratio statistic the value 99, 
regardless of the structure of the experiment in which it was obtained, has the same qualitative and quanti- 
tative properties, as evidence, as the outcome “reject” obtained by a statistical test having probabilities of 
errors of both kinds equal to .01. Such examples and interpretations illustrate the nature of the objective 
probabilistic bridge which can be constructed to connect systematically the two standpoints presented 
here by Mr. Neyman and Mr. Barnard. 


I believe that such analysis clarifies certain essential unities and certain essential differences betweon 
the two standpoints mentioned, and that it can throw further light on the possibilities and possible limita- 
tions of programmes, such as that of Mr. Rao, which aim to go as far as possible in developing a singlo 
type of inference method which will prove satisfactory from both of these standpoints. 


Mr. Rao: It is my first duty to express thanks to all those who contributed to tho discussion. 
I have intentionally made some provocative statements in my paper to invite criticism necessary for a 
proper understanding of the issues involved. I think my plan has borne fruit. I would like to consider 
the various points raised in the discussion under a number of headings. The first one is historical. 


1, HisTORICAL ASPECTS 


I must admit I have not touched on the historical aspects of the m.l. method adequately, as that 
would be outside the scope of the subject assigned to me. But since Mr. Neyman raised some historical 
issues in his diseussion I have to answer them. 


Mr. Neyman states, “аз far as Т am aware, the priority in the approach of m.l.e..... belongs to F. Y. 
Edgeworth (1908а)”10, a statement which Edgeworth himself would have contradicted as he attributed 
the method to Gauss, Laplace, and Pearson (footnotes on pages 384 and 395 of Edgeworth, 1908a). 1% 
also appears from Edgeworth's understanding of the earlier writers that the justification of m.l.c. consists 
in the inverse probability argument. Edgeworth (1908b, р. 500)1! himself supported this view and was 
also aware of the contradictions involved in assigning the same a priori probability distributions to different 
functions of parameters, but contended that the matter was not serious in large samples and for functions 
not out of the ordinary (p. 392, Edgeworth, 1908a). It is, indeed, surprising that Mr. Neyman, paraphras- 
ing Edgeworth’s work, asserts that the estimate obtained by the method of inverse probability (i.e., by 


Maximising the a posteriori distribution) is in fact the m.l, estimate. If? is an m.l. estimate of 8, then 


$(0) is ап m.l, estimate of any one-to-one function @(8), while such a property is not true of estimates 
obtained by the inverse probability argument. 


As for Laplace’s work, it is clear from the interprotation by Todhunter (1865, p. 576, 585)12 that 
Laplace never stressed the choice of “most probable result” nor did he justify its use in preference to any 
other method. I had no access to contributions by Gauss on this subject, but I take the liberty of quoting 
Mr. Barnard who thought that Gauss's justification of maximising the probability for estimation of para- 
meters is not free from inverse probability. 


10 Edgeworth, F, Y. (190823): On the probable errors of frequency constants, JRSS, LXXI, 381. 
11 Edgeworth, Е. Y. (1908b): On the probable errors of frequency constants, J RSS, LX XT, 499. 


1? Todhunter, I. (1865) : А History of the Mathematical Theory of Probability. Chalsea Pub- 
lishing Company, New York. (1949 edition). 
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TR Кан Pearson (1896)13 used the m.l. method in estimating the correlation coefficient without offerin| 
just АОН (p. 125), but his subsequent work with Filon (1898)14 on standard errors of ‘fret i ; 
(estimates of parameters) did not show that he was considering m.l. estimates. It ај CUR ie Сы 
attempting to compute the уагіапсе-соуагіапее matrix of the asymptotic a аи us 
рага presumably derived from a uniform a priori distribution and a sufficiently large sample. Tho 
analysis was not, however, rigorous.15 We may thus infer that the authors were attempting to doris 
the asymptotie standard deviation of estimates which are the mean values of the a posteriori к 
or those obtained by maximising the а posteriori probability density of the parameters. Tt is, i. 
somewhat puzzling to note that Pearson used the same expressions to determine the DENEA dander! 
orrors of estimates obtained by the method of moments as well. It is also well known that Pearson did not 
advocate’the use of m.l. in his subsequent writings. 


‘A reference has also been made by Neyman to the result of Edgeworth, proved with the help of 
Lovo, that the most probable value (as determined by inverse probability) has the “smallest mean square 
doviation from the true point.” This, indeed, is a remarkable attempt although the class of alternative 
ostimates was very much restricted and the estimation was confined to location and scale parameters. A 
difforont argument is necessary to establish this result for the estimate of any general parameter and under 
loss restrictive conditions on the class of estimates. The result, however, is not true, as observed by Hodges 
and mentioned by Neyman in the present discussion, without any restriction on the class of estimates, 


Wo, therefore, do not have any literature supporting prior claims to the method of m.Le,, as а 
prineiple capable of wide application and justifying, its use on reasonable criteria (such as efficioncy in a 
sonso wider than that used by Edgeworth and consistency) and not on inverse probability argument, 
before the fundamental contributions by Fisher in 1922 and 1925. 


2. PHILOSOPHICAL STANDPOINT 


Mr. Neyman wants me to explain my philosophical standpoint on estimation. “Should т.е, always 
be used irrespective of the properties it may have 1? If I understand correctly the spirit of this question 
and the emphasis on point estimation by Neyman (in the case of contamination of water) and by Berkson. 
(in determining the concentration of a solution), I must differ from their philosophy quite sharply, I think 
for tho estimation of contamination of water, instead of giving a point estimate, sufficiently overestimated and 
considered safe (in some sense), а statistician should ideally provide the customer with a wholo sories of 


inferences about the unknown value and the associated risks or consequences. For instance, in large samples 


under fairly general conditions, an estimate such as that obtained-by m.l. together with its standard error 


ostimable from the data themselves provides the complete answer. In small samples, mechanisms exist, 
under favourable circumstances, for providing fiducial probability statements or a whole series of fiducial . 
limits in the sense of Fisher or confidence limits (interval, upper and lower) in the sense of Neyman. 


under or over estimation arise. Again in the example of 


suggests that 1 should preferably give an unbiased 
by the manufacturer of the instrument.” Assuming 


I do not see how considerations of bias, 
estimation of variance of an instrument, Neyman 


estimate, if I have to escape “the lawsuit for damages р 
that a lawsuit is filed whenever there is an error in the estimate, an unbiased estimate can only give a 


montal consolation that errors made, however large they are, even out in the long run, although heavy 
damages may have to be paid every time | It must be noted that if one adopts the “minimum mean squaro 


error” criterion for the choice of an estimate, the unbiased estimate may not even be admissible in the 
sense of decision theory. If the damage to be paid is proportional to the square of the error, I should not 


give an unbiased estimate. 
Ум 


al contributions to the theory of evolution IV. Regression, 
London Series A, 187, 253. 

Mathematical contributions to the theory of evolution 
On the probable errors of frequency constants and on the influence of random selection on variation. and 
correlation. Phil. Trans. Roy. Soc., London, 191, 229. 

under investigation and it is hoped to p 


13 Pearson, К. (1896): Mathematic: 
Horedity and Panmixia. Phil. Trans. Roy. Soc. 


14 Pearson, K. and Filon, L. N. G. (1898) : 


i lts elsewhere. 
15This problem is now ublish some of the resul 
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Iam aware that in some methodological problems such as obtaining a pooled estimate by averaging 
parallel estimates, one need to consider unbiased or nearly unbiased estimates, This can be achieved, in 
many cases, by a suitable adjustment of an available estimate. 


What I maintain is that m.l.e. provides a convenient summary of data, demonstrably better than 
other methods in large samples, for answering questions of interest concerning an unknown parsmeter, 
and not that a point estimate obtained by maximising the likelihood is the anawer to any specified question. 
Neyman and Berkson were repeatedly asking me during the discussion whether I would suggest the m.l. 
estimate in all situations, І do not know how the misunderstanding has arisen. 


Iam glad to note that Neyman looks upon "super efficient" estimates only as examples to show What 
the definition of efficiency as the attainment of minimum asymptotic variance is void without some restric- 
tion on the estimate. But I donot see why, in large samples, Hodges-LeCam “super efficient" estimates or the 
“super efficient” estimate given in the present paper for the mean of a normal population should not be 
preferred to Z, the sample mean (from behavioristic viewpoint). On the basis of decision theory, there 
is, perhaps, justification in doing so, or at least 7 has no definite claims over the other, Му objection is, 
however, for other reasons. The super efficient estimate of the present paper is a function of the median 
and the mean of a sample of observations and is, therefore, less useful than 2, for purposes of statistical 
inference. The “super efficient” estimate of Hodges-LeCam is, however, equivalent to m.l.e. in large 
samples, i.e., efficient in the sense defined in the present paper and one may expect no substantial differ- 
ence in the inferences associated with the two estimates in large samples. But it has certain defects. 
For instance, its asymptotic standard deviation being a discontinuous function of the unknown parameter 
does not admit reasonable estimation. Consequently, the inversion of a “super efficient” ostimate for in- 
forence on the unknown parameter becomes a little complicated, 


3. MINIMUM MEAN SQUARE ERROR 


Tt was not my intention to be unfair to Berkson in pointing out certain defects in the criterion of 
minimum mean square error. The example due to Silverstone of estimating the probability of success 
by the constant 1/2 may be of a special nature. But we have a number of examples to illustrate that smallor 
variance does not necessarily mean higher concentration round the true value, It does not also imply that 
an estimate with а smaller variance provides a better discrimination between alternativo values of the 
parameters, Recently, at my suggestion, Sethuraman (1960)16 examined the relative powers of two statis- 
ties £ and 2 £ 4-3, (where £ and 7 are the maximum and minimum respectively in a sample of size n from 
a rectangular population in the range (0, 20)), for testing the hypothesis that 9 = 0%. Although as an esti- 
mate of 0, the m.l. estimate £/2 has uniformly larger variance than (3--7)/5, an alternative estimate, 
it has better power as a test criterion for values of 9 close to the assigned one, Since estimates with minimum 
mean square may not have other desirable properties, I was, maturally inelined to ask Berkson about tho 
significance of, or the motivation for the choice of this criterion. 


Berkson observes that if he follows my philosophy on theory of estimation, it will be disastrous in 
routine practice as in the use of a bio-assay for medical purposes, because one has to wait indefinitely col- 
lecting more and more observations before а decision сап be reached. Т have not said that decisions should 
not and cannot be made on the basis of available data, however meagre thoy are. But I am not convinced 
that an estimate which has minimum mean square error will be of help in minimising the mortality among 

х his patients, I will only be too glad to accept Berkson’s procedure if the latter were to be true. Iam 
sure that for a statistical procedure to be made available for routine practice the approach should be some- 
what different, Past data, as they accumulate, must be effectively used to improve the existing procedure. 
The theory of estimation as developed by Fisher is most suitable for such situations. It is not claimed 
anywhere that the m.l. estimates are minimal sufficient statistics, although they are explicit functions of 
the latter. It may be seen that in the problem of fitting a logistic function (specified by two parameters 


16 Sethuraman, J. (1960) : Conflicting criteria of “Goodness” of statistics, Sankhya, series A 
(in press). 
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a, to bio-ass i 

e n Ж. a a не S mus does provide minimal sufficient statistics in samples where the m.l, 
—— Е в. EE ү à Tf a suitable convention of specifying an estimate, like the one suggested 
25 NC fe у ypted in the ee of samples for which the m.l. estimates of one or both of @ and В 
£ ite, su EU of m.l. estimates can be claimed for all samples.17 The situation is not so simple 
in the case of ШШШ logit x2, advocated by Berkson. Generally, the estimates obtained by this ee 
are по вш But Borkson insists on quoting one example with 3 doses and 10 animals at each dose, 
in which case, with the application of a dubious rule such аз 2n-th, the minimum logit X? esti i; 
to be sufficient. | x ort 


E эана gives: no орады in favour of mean square error, except that “it is a representative 
oss function, and if, in a particular application, some other loss function, suggests itself, let it be investi- 


znted : suppose @ and @* are two alternative estimates of a parameter a such that, 
B(a@—a)?> E(a* —a): 


and there exists a function ø such that 


,19(0) (а) E pla) pla) 


then the estimate of æ obtained by using one loss function is not good with respect to another loss 
function. Such situations are not rare and any number of examples with a reasonable choice of the 
funet ion ¢ can be given. In the problem of fitting the logistic function, I venture to suggest that some 
inoreaaing function of the differences between hypothetical and estimated probabilities of success or 
failure at each doso, may be a better indicator of the goodness of estimation than the deviations 
in the estimates of parameters œ and В themselves. I do not know whether a minimum logit x? 
estimate would havo smaller expected loss than other types of estimates when loss functions of the type 
indicated аге considered. 


4, OTHER ASPECTS 


The views expressed by Barnard on point estimation, the role it plays in specifying tho likelhoood; 
not seem to be in conflict with those in my paper. 


and Из relation to a practical decision problem do 
exactly the 


Joth of us have tried to interpret Fisher's work on estimation, though not completely and not in 
some wrong notions about m.l. found in recent literature. 


Iam particularly interested in Birnbaum's contribution to the theory of estimation as it provides 


а small sample justification to certain estimation procedures including the m.l. This is а far more difficult 
ks mainly to the case of large samples. І cannot : 


task than what I have attempted to do confining my remar! 

think of situations where serious decisions are taken on meagre evidence supplied by small samples, while 
in routine practice such as the application of control charts in industry one may think of specifying rules of 
action based even on very small samples to minimise certain risks in the long run. Further discussion on 
small samples as attempted by Birnbaum would, no doubt, be of great value. 


I would also wish to take the opportunity of mentioning a few results in connection with the in- 
vestigation mentioned in the last paragraph of my paper. It was thought that no distinction could be 
made in large samples among estimation procedures such as m.l., minimum chi-square, modified шырп 
chi-square, etc. since they all provide asymptotically efficient estimates in a wider sense of > i (i andi 
аге informations, per observation contained in the statistic T and the sample respectively). But as mentioned 
by Fisher in the 1925 paper, differences jn the actual amounts of information contained in different: РОО 
are more relevant. It has been possible to compute а quantity, analogous to, if not same as, the limiting 
difference in the total information contained in tho statistic and in the sample and establish that the m.l. 
method has the least limiting loss. The minimum chi-square, modified minimum chi-square, and. piner 
related methods have a greater loss. The actual values are given by the author in а paper under print 
in the Proceedings of the 4th Berkeley Symposium оп Statistics and Probability. 

17 The emphasis should be not on estimating the parameters a and f but on probabilities of Со 
at various doses. Тһе parameter space has then to be properly defined in terms of these probabilities. 
Once this is done, many difüculties mentioned by Dr. Berkson would disappear. 


same way. I hope they will serve to remove 


the theory of estimation in 
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DENSITY IN THE LIGHT OF PROBABILITY THEORY 
By E. M. PAUL 
Indian Statistical Institute 


SUMMARY. Let p, = 2, Pa, Pg, ~» be the prime numbers in ascending order, Let (X,) be a 


soquence of measure spaces, each Xg consisting of the points 0, 1, 2, 3, .... In Xn, we placo mass 


1 
( 1 ) = at the point r(r = 0, 1, 2, .... We tako the product spaco X; X2 X; ... and the 
n 


\ E Pn 
product measure P in it. Each point of this space is an ‘infinite vector’ (a1, 22» ...)› the coordinates 


being nonnegative integers. 
Now let 5 be any set of positive integers. By the upper magnification M"U(S) of S, we mean the 
sot of vectors (21, 22, ..) such that y P», ES p, eS for infnitely many values of n. By the lower 


ТЕЛ æn 
magnification My, (S) of S, we mean the set of vectors (21, 22, ..) such that 2 p, Pn eS for all 
sufliciontly large и. In this paper, we prove that P[Mz (S)] < 5108) < 8U(S) < PLMU(S)], whore 375) 
and 8U(S) represent the lower and upper logarithmie densities of 8, respectively. 

Also, let f be a real-valued function defined on the set, of positive integers. We prove that if the 


sequence of random variables f ( г" PU e ). defined on the probability space X X, Хз... convergos 


with probability one {оа random variable g, then f has a distribution, namely, that of g; in defining tho 


distribution of f, we employ logarithmic density. 


In this paper, we formulate the whole theory in an abstract framework. 


1. INTRODUCTION 

t of positive integers has close connections with 

the general idea of probability. Still, when one tries to employ the machinery of the 

modern theory of probability in the investigation of density, many difficulties are 

encountered. The basic reason for this situation stems from the fact that density is 
tudy density by embedding 


not, in general, countably additive. In this paper, we 8 
the set of positive (or nonnegative) integers in a suitable probability space. In oe 
to be able to tackle different kinds of problems, we develop the theory axiomatically 
and demonstrate some concrete situations to which the axiomatic theory is applicable. 
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2. ТнЕ SPACE X : 
We consider a sequence (X,) of abstract spaces. Each X, will consist of а 


sequence of points p°, p`, p ,.... We form the product space ie X,)-— X. Thus 
p dois. p pace X (X, 


an element of X will be a ‘vector’ (р'і, ру, ...) where each a, is a nonnegative integer. 
We shall also denote this element by pe pp... However, we do not have any 
algebraical operation in view. Wemay also denote the element pr pS... by (од, dy, S 


I will stand for the set of vectors (in X) having only а finite number of positive coordi- 
nates. For l& ri <f, < ту... < rj and m, ть, ..., my all > 0, by the set of vectors 
in I corresponding to (pr^, Dyer sis +++) Р) we shall mean the set of those vectors in 7 
whose 7,-th coordinate is my, ..., d coordinate is m,. We shall denote this set 
by S(p, ..., риё). We shall often denote the general vector (21, ta, a, ...) by =. 


We now associate with every subset с of Г а real number ó"(g) satisfying 
Postulates (A) to (Е), о” will stand for (1—0). 


(А) 0< 6%) < 1 for every о. 
(B) If c, and o, are two subsets of Г and о, C o, (о) < 9"(a;). 
(C) Ifo, and c; are two disjoint sets, 
(o, U су) < Зе) 8 (og) 
and 3"(01)--9" (o3) > 14-0901 (1 o). 
(D) à") = 1. 
5%) = 0, 9 being the empty set ( < J). 
Before introducing Postulates (E) and (Е), we frame 
Definition 1: For every о(С І), we define 6,(7) to be = 1—98"(g"). 
Proposition 1: For every e, 0 < буо) < 5 (с) <1 
By Postulates (C) and (D), 
1 = 090) = (o Ue") < 8%о)--8%(о”). 
00) = 1—8”) < #700). 
Since 9"(o^) < 1, 1—8%o') > 0. 
O(c) > 0. 
We note that 0,06) = 1 (4) = 1—81) = 0; 00) = 1—09) = 1. 


Proposition 2: If о, Со, 40 ) 2 2 і 1 А 
Ў 1) > 2005). Since о Со, 
дт) < 0"(с5), by Postulate (B). ое prar 


1—9"(o1) > 1—8%(o4). 
800) > 403). 
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Definition 2: If т is such that 2,00) = б (о), we denote the common value : 
by (c). j 
Proposition 3; 6 is finitely additive ; that ів, if c, and c, are two disjoint 
subsets of I such that 6(¢,) and (o) exist, then 6(,Uo,) exists and = д(01)-4-0(0,). 
ў Proof: Ву ме (0), (0, UU ms) < 8 (в) 8 (в), and the latter 
= lo) 3405) = 1—0901)-1—0903) < 2— (14-801 Г 09) = 1 —9"toi (Yos) 
= 6,(0, Uo). 
Dlo, оз) = 20 U 0%) = 8(01) 0). 
Proposition 4 : 1 (с) is a sequence subsets of disjoint of J and each о’, has a д, 
SUr) > X 8). 
n n 


т 
Proof: By Proposition 3, U c; has ô = 5 6(¢,,), for every m. 
n=l n=1 


Now Ue,2 Ü т. 
n n=1 

So by Proposition 2, aUo) S506) = о) = X). 
n n=] n-l nel 


Since this is true for all m, 80.0») > 2 369. 
n n=1 


Proposition 5: If o( CI) is such that 89(с)--09(0') = 1, then 8(т) exists. 


Proof : à,(0)--95(0") = 1—6%(o')+1—6%(c) = 1. 
So 8"(g)--" (e^) = 1, 6,(0)+8,(0") = 1. 
So by Proposition 1, де) = 80), 9" (o) = 80°). 


So б(т) and б(т') exist. 
Corollary : If o is such that ôo) +z) = 1, then (c) exists. 
1—"(g^)--1—8"(g) = 1 
U(g) 4-0" (a^) = 1. 


Postulate (E): For 1<ту< т<... «n, and my m, all. > 0, 
с = 8( р", E p... has 8,(0) = 8 (в) = òo). We shall denote the с associated 
1 7) 


with S (p^...) Бу 3a, оне 


Postulate (Е): For m2 1, 20(о) = 1 where o = S(p™, pp, - 
vectors (01, na ... Mm) With nonnegative integral 


T" pim), the 


summation extending over all 
coordinates. 
Proposition 6: Let 0 < Bi < Ё < 
integers. Then U S(pfm) has а ё= 2 8[8(р®)]. 
finitely many //s, the result is immediate since à 
So suppose there are infinitely many //'s. 
of nonnegative integers complementary to the 


... be a finite or infinite sequence of 


Proof: Tf there are only 

is finitely additive (Proposition 3). 
Let Г = (y, < yy <...) be the set 
set В of f's. ; 
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By Богова 4, 
две) > X a Spee]. ... (2.1) 


Similarly, FLU Sr 2 E 018(р1")]. ... (2.2) 
Now by Postulate (Е), z 8[8(р°")] + $915!" = 1. Б (2:3) 
From (2.1), (2.2) апа (2.3) we get 
FLU S(pim) = А] T8US(Q?) = B] > 1. 
1—4"(A)4-1—8"(B) р 1. 
8%(A)+0%(B) < 1. 


Now A and B are complementary sets, So by Postulate (C), 
9"(A)--9"(B) > 6%) = 1. 


Бо ` 6°(A)+6%(B) = 1. С (2.4) 
So by Proposition 5, 0(4) and (В) exist. Now we write (2.1) and (2.2) as 
04) > a, 
(В) > y. 
(2.3) becomes ау = 1 
(2.4) becomes 9(A)--à(B) = 1. 


(A) = 2, XB) = y. 
Spm) = X: 8[8(р#т)]. 


On the basis of the foregoing six postulates we introduce a probability distri- 
bution in space X. In the space Ху, we place at рїї mass = 9[S(pi)], n = 
0,1,2,3,.... By Postulate (F), the total mass is 1. Similarly in the space Х,Х,, 
we place at (Di, py?) mass = [S (ph, 22) 

n, 


= 0, 1, 2, 3, ... 
Ng 


Їп this way we get a probability distribution in X, X, ... X,-space for k = в... 
These distributions are mutually consistent. For example, we shall prove that the 
distribution in X, X, is consistent with that in X,. 
So) = S(p, pU SQ", ре)... 
By (B), and by Proposition 3, . 
9n, > дт, ёт... to n terms, 
бт > Om, ды... ad inf. 
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This is true for т, = 0, 1, 2, 3, ... So adding these inequalities, 


e = LJ e 
DE s { È mm |= SE A. 


тү=0 ту=0 \тү=0 ту=0 1-0 


Now by Postulate (Е), each extreme member has value 1. 800, = 0, 01,11 
3 1 ds ds 

ad inf for each m, = 0,1,2,.... In this way, we see that the distributions in 

X,, ХХ, XXXs, ... are all consistent. So by Kolmogorov's Theorem (1956, p. 27), 

we get a unique probability distribution in X. 


We shall denote this measure by P. 


Proposition 7: Let 1<т<%<..<т„ Let (o, Кр, E a= 
k 
be a finite or infinite set of points in the Xr, X», ... Xp, -врасе. Then 


Muss a) e 


mo rds a) Paz Pol oP rl 


n 


where e( pM, Lon р"" ) denotes the cylinder set formed by vectors having m», a8 
k 


the r,-th coordinate, ... and т, as the ryth coordinate. 


Proof: We observe that o( рү", T р") may be looked upon as the 
1 Е" 


union of countably many disjoint cylinder sets, each of these sets having ав base а 
f the proof is exactly parallel to that 


single point т X, X, Хз... Xr space. The rest o: 
of Proposition 6. 


3. THE MAGNIFICATION THEOREM 
set of I. The set Ми of vectors (21, 22, -.-) єХ such that 
(ay, жу, «+5 Lp 0, 0, --:)eS for infinitely many values of k will be called the upper magni- 


fication of S. The set M; of. vectors (24, ta ...) eX such that (vj... % 0, 0, +S 
for all sufficiently large values of k will be called the lower magnification of 5. Clearly, 


M СМ, and both sets are measurable. 


Let S be any sub 


We also employ the notations M,(8) and м8). Е М8) = М8), we 


shall denote № by (8). 
Let 8' = 1—8. Then м„(8') = Х—М(8) and M,(8’) = 


onditions. Let I, be the subset of I consisting of 
(at most). 


х—Му8). 


We now introduce two © 
vectors having nonzero coordinates only in the first Ё places 
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Condition G: (1) = 0, Е = 1,2, 3, .... We shall call a subset S of I right 
complete in case (2,, 2, ..., 21, 0, 0, +++) eS implies (a, ..., Ens 2,11, +e Cram» 0, 0,...)є® 
for every m > 0 and arbitrary nonnegative integers 2,,,,... 2,,5. It is easy to 
verify that if S is right-complete, M;(S) = M,(S). We note that а right-complete 
веб containing (0, 0, 0, ...) must be the whole of Г. : 


Condition Н: Every right-complete set T is such that 4(7') exists and 
XT) = PLM(D. 


We shall formulate condition H in a slightly different form. We shall call a 
subset T' of I left-complete in case (21, ta, ..., Zp, 0, 0, ...)eT' implies (21, ..., v, 4, 0, 0, 0, 
«u )eT for n = 1, 2,3, .... Thus every nonempty left-complete set contains the vector 
(0, 0, 0,...). If T is left-complete, M,(T) = M,(T). If T is left-complete, (7—7) 
is right-complete and vice versa. Condition H is equivalent to condition Н,. 


_ Condition H,: Every left-complete set T is such that ó(T) exists and 
à(T) = P[M(T)]. 
Theorem 1: If conditions G and Н hold, 
Р[И(8)] > 9"(8) 
where S is an arbitrary subset of J. 
Proof: For Е = 1, 2,3, ..., let G, be the set of vectors (2, 203, 23; ...) such 


that (v, ..., a, 0, 0, ...), (и... 063: 0,0, ...), (ыз, Zerg 0,0, 0,...),... all є 
= 1—8. We easily verify that Gi бы, for w= 9; $, .. and that 


y G, =X—M,(S8). Take any e> 0. Let є) = k be such that РГ > P[X— 
M,(S)|—e. Consider the set B, of vectors (0, 0, 0, ...), (ш, 0,0, ...), (a4, 23,0, 0, ...), 


. Where the vector (21, 25,35, ...) runs through Grt). 


В, is а left-complete set. So d(B,) exists and is = P(M,(B, )] > P[G(e)]> 
P[X—My(S)]—e. Now let D, be the set of vectors (0025. 260) 0; 0, .:.), (yy s-s ень 


0, 0,...), (ар, ..., Meera, 0, 0,...),... Where (2, 2, 25, ...) runs through G,,,. Then 
24Р) = В.) by condition G апа Postulate (С). So 6;(S’) > ôD.) = 8,(B,) = o(B,) 
> РІХ-М 0(8)]—. Since є > 0 is arbitrary, 


9,(8) > P[X —M,(S)]. 
8908) = 1—08) < P[My(8)]. 


Corollary (The Magnification Theorem): Jf conditions G and Н hold, then 
for every subset S of 1. 


РІМ8)] < 648) < 9"(S) < PEMS) ]. 


Proof: We apply Theorem 1 to §’ = 1— S and get PLM"(S')] > 998). 
But MS’) = X—M,(8) and 6,(8) = 1—58). 
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In view of the importance of condition H, we shall examine the structure of 
right-complete sets. Let S be a right-complete set. If (0,0,...) 68, 8 = I. So let 
us suppose (0, 0, 0, ...)¢S. We shall call (ад, ..., 2,, 0, 0, 0, ...) a basic vector of (or in) 
S in ease (zy, ...,2,, 0, 0, ...)eS but (24, ...,2, 4, 0, 0, ...)68; if т = 1, we interpret 
(1, -++) n- 0, 0,...) ав (0, 0,0,...). TE (a, ..., Em 0,0, ...) is a basie vector, x, > 0. 
If ve S, there is a unique basic vector {= (x, ...,2,,0, 0, ...), n 1, such 
that v = (21, ... £u Улу se Улы» O; 0, 0, ...) for some т > 0. Conversely, 
if (zy, ..., Zp, 0, 0, ...) is a basic vector of S, (8,,...,*,, Уп» +++» Vni 0, 0, 0, ...)eS, where 
т, Уна +++) Уп+т ate any nonnegative integers. Thus the basic vectors of S completely 
determine S. Let the basic vectors of S be бу, (>, ... where б„ = (v, ..., 21, 0, 0, ...), 
аһ > 1. Then M(S) = U О, where C, is the cylinder set formed by vectors whose 


first r, coordinates are 2, ...,%m in that order; C; N €; is.empty if i Aj. 


4. DISTRIBUTIONS OF ARITHMETIOAL FUNOTIONS 
We refer to Section 2. We have the space X and the probability measure 
P in it. 
` Let f be a finite real-valued function defined on the subset I of X. f will be 
said to have the distribution Q in case Q is a probability distribution on (—оо, oo) and 
fer every c such that Q(c) — 0 


E(neI and f(n) < c) has д = @(—оо, с). 


If f has a distribution, it is unique, since two distributions are identical in case they 
assign the same measure to (—co, c) for every common continuity point с. 
Suppose / is а real finite-valued function defined in the set J. 
Theorem 2: Suppose g(x) = lave! fum рї... рт) exists at almost all points 
ж = (ay, Oa eee 
Moreover, let à" satisfy conditions G and Н of Section 3. Then f has a 
distribution and this is the same as the distribution of g(x). 
Proof: Let Q be the distribution of g(a). Let c be such that Q(c) = 0. Let 
Е, be the set of points жє X such that g(x) < c. Let S, be the set of vectors ne I 
such that f(n) < c. Suppose x = (t; fa Vy ves) Css Then lim Др’ pry... pam) < с; 
so for all sufficiently large values of m, Др, ve ptm) < c So for all sufficiently large 
values of т, рїї рї... рэт eS, Thus (ty va +) eM,(S,). So E, C M;{S,). 
à4(8) > PIM;(S)] > P(E.) = 9—90, о). 
Similarly if T, is the set of vectors mel such that f(n) > c, 887%) > 96, 90). 
8%(S,) = 1—85) < 1—07) 
106, oc) = @(—о, c), since Q(c) = 0. 
But we have 6,(S.) > @(—‹оо, с). s. 6(8,) = @(—00, 0). 
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5. INTEGRATION WITH ВЕЗРЕСТ TO DENSITY 
Suppose f is a bounded real-valued function defined on the set (С X) Let 


Aj, ..., A, be disjoint subsets of Z such that U A; = I and such that б(4;) exists for 
i= 1l, ... n; let us call this partition P. Let 


s(P) = È X4) sup fle), 


(Р) =® 4) int fle). 


As usual, we call the lower bound of S(P) the upper integral of f with respect 
to ô and denote № by 1/48). Similarly the lower integral S fd). 


Theorem 3: Let f be any bounded real-valued function defined on І. 
For every right-complete subset S of І, let M(S) be such that P[M(S)] = (8) 
= 0108). Let for each ж = (ty, s, ...) € X, g(x) = L.U.B. (ел, r Zm 0, 0, ...). Then 
n>l 
TIND < [ ий. 


Proof: Suppose —К < g < +K. Take any є > 0. 


Partition(— К, + K)into a finite number of sub-intervals in such a way that (1) the end- 

points of the sub-intervals are all continuity points in the distribution of the random 

variable g(x), and (ii) every approximative sum for f g(v)dP formed on the basis of 
x 


this partition lies between I gd P —e and МИЕ е. Let (yj Урал) be a typical sub- 


interval in this partition; let A, be the subset of X on which g(x) > yj A, (1 is 
a right-complete set; and A, is the magnification of this set. Hence 


9(4,—454) N D = Р(А„—А„) = Priye < ge) < йкы}. 


Now partition J into the subsets (4;—4,,,) Г\ I. Upper approximate sum for f fd(2) 
given by this partition is < 


E инь Ана) (M) = E ља Pride < ge) < уы). 
Now the latter expression is < JoeP +e. 
Thus 1/40) < atop. 


In the foregoing theorem, g(x) was defined as L.U.B. f(a, ...,2,; 0, 0, ...). 
n>1 


We shall show that if condition G also holds, the proof will hold if g(x) is defined as 
Тер, Јал, «++, Em 0, 0,...) where т is any positive integer. Taking the limit аз 
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m-—»o0, we get the 
Corollary : Under conditions G and H, 


SFG) < [ {йт sup flen ..., 2,, 0, 0, „Р. 
Similarly ffo) > { {lim inf flay, ..., tn 0, 0, ...ЈаР. 


We now give the proof for the case where g(x) is defined as L.U.B. f(a, ..., tp, 
п> т 
0, 0, ...). As before, we define 4, as E [9(2) > y;] But now 4, [(\ І may not be 
z 


right-complete. For example, it may happen that f(1, 0, 0, 0, ...) > y but for every 
other vector with 1 as first coordinate, f is < У. 

Fortunately, we have M;(A, (11) = 0,4, I) = Ау. First we prove that 
A, < MA, QI). Let (т, to ..)еАь Then f(a, ... Vm 0, 0, ...) > Yo for at least 
one a 2 m. Then for every positive integer 7, (24, .-.) 21... Хп» 0, 0, 0, ...)є Ag 
and so e А, (^) I. Hence (2, ta ::.)е И (Ак N I). So 4. С М4, N I). Now suppose 
(24, ху, ...)eMp(Ag NI). So there is an т > m such that (zy, ..., n 0, 0, eA) T. 
So at least one of the numbers (21, ..., m 0, 0, ...), f(ty ... Жан 0, 0, ), of to +. Lnr 
0, 0, 0, ...) must be > yp Hence (ty, 25 ...)eA;. Thus МА (1) D) C А. 

Since M(A, () I) = А, and conditions G and Н hold, we have by the Magni- 
fication Theorem given in Section 3, б(А„ (11) = P(A,). The rest of the proof Pp as 
before. 


6. APPLICATION TO LOGARITHMIO DENSITY 
We now take up an important concrete case where Postulates (A) to (F) and 
conditions G and Н hold. We recall that if A is any set of positive integers, 


1 5 1: 
he fazaa} 
LneA  J and lim inf И 

og k © 


log k 


are respectively called the upper andlower logarithmic densities of A. If they are equal, 


this value is called the logarithmic density of А. If L,(A), L(A), -N,(A) and М (А) 


are respectively the lower logarithmic, upper logarithmic, lower natural and upper 


natural densities of A, N,(4) < 24) < (А) < МА) 
. be the prime numbers in ascending order of magnitude. 


ы Let р" = 0,1,2,...) carry measure 


Consider the space X,, ^= L2, REED : 
1 ( = E In the space X we introduce the product measure. Let (8o e Vn, 
n 


т 2. e, zs 

0, 0, ...) eZ; with this vector we associate the positive integer 2 SE pm qn MES CT, 
we define à"(S) to be the upper logarithmic density of the set DR p o a 
responding to the vectors in S. It is easy to verify that this à" satisfies Pos 
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(A) to (F) and that the measure that arises in X coincides with the product measure we 
have formed. Condition Gis true in this case. (This is true even for natural 
density). 
We now verify condition Н. Let В bea right-complete subset of Г. We shall 
prove that B has logarithmic density equal to P[.M(B)]. Since logarithmic density 
is finitely additive, the result is immediate if B has only a finite number of basic 
vectors. So we suppose that В has infinitely many basic vectors. We shall employ 
the Theorem : (Wintner, 1944, *, p, 53) : А set S of positive integers has logarithmic 
| 


density, L(S), if and only if (8—1) € E tends to a limit as 8—1--0; in which case 
neg 


the limit is L(S). 
Let É, Ё„,... be the basic vectors in B. Let Ё, = (r % .. Vm 070, 0,..:), 


% >i. zl. where each % corresponds to &,, is 


(0-5) - O) 


this being true for all s > 1. 
Thus @—1) X 1, = (6 DEDE fe) 


where 


Since lim (s—1) ¢(s) = 1, we now have to prove that 
8140 
lim Ef(s = X f,(1) 
$140 г r 


this latter sum being the measure of M(B). 


г 1 af 
Since у mes ELE A(s), 218) < 1. | 
2 | 
= f(s) is uniformly convergent on (14-0) < s < 2 for every д > 0; this is seen by | 


noting that 


fils) < 1-6 < 8 2. 


1 Я 
(14-0) 
(221... йт) 


N T > : 
Lat us pub gale) = X fle Да) = X NG): ша сү... шш 
put був) = X fo) 9) = $ fle). Then 0 < qa) < В оо 


li 
ам#) = Gy(l) and hence ‚үз дв) > 0. Also, gx(1) < 00). 
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We are now ready to prove that g(s)g(1) as s 14-0. Suppose lim g(s) 
59140 


is >9(1). We recall that gy(s) converges to g(s) uniformly on [1-++6, 2] for every 
8 > 0. Let us choose a value s, of s very close to 1 such that j(5,) is almost equal to 


m, g(s): Let us then choose a large N so that gy(5) is almost equal to g(5)). Now 
gy(1) is < g(1) < dms g(s) and gy(s,) is almost equal to this latter value. Thus 


we can find an s, exceeding 1 by an arbitrarily small quantity and ап N such that 

ga (59) is arbitrarily large (and positive). Let us now make this reasoning precise. 

Let в be any fixed positive number < $ { lim g(s)—g(1)}; let K be any arbitrarily 
8—14+0 


large positive number, let s, be any number > 1 such that g(s,) > Tim g(s)—e. 
8—у1+0 
Let N(s,)=N be so large that |g (81) —03(8)1 < ©. Then gx(5)) > lim. 9(s)—26. 
: DES 
So by the mean-value theorem, there is an 5, such that 1 < 8 < 81 and 


gy(s)—g(1) о _g(s)—g(1)—¢ 
> $10 - 


8—1 (8—1) 


#у(вә) = 


By moving 8; sufficiently close to 1 we can make this ratio > К. бо gu (55) > К. 


We shall now see that there is an absolute constant А such that 9.(8) < A for 
all N and all se(1, 2). Та fact, 


ву = * og dm _ P. gn 
по = iie (S, Edo — Yos (^a } 


«fol X в, — log |, since я, > 1. 


mel Ча 1 


у and Wright, 1954, p. 348) in the analytic theory 


Now an elementary Theorem (Hard 
absolute constant C such that for all n 


of numbers states that there is an 


$108 dn 0 
(<; < log dut ` 
108 dn m=1 (@ —1) 


So fils) < fle)? 


gye) = $ He) < € gal) < ©. 


This contradiction proves that lim g(s) = 9(1). 
9140 
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It may be noted that condition Н no longer holds if, inthe preceding discus- 
sion, logarithmic density is replaced by natural density. In fact, Besicovitch (1934, 
p. 336-341) has constructed a set of positive integers such that the set S of their 
multiples has no natural density; but S is a right-complete set. 

This paper is based on a dissertation submitted in partial fulfilment of the 
requirements for the Ph.D. degree at the University of Illinois, in 1960, The author 
wishes to express his gratitude to Professor J. L. Doob for his guidance during the 
preparation of the thesis. 
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ON STRUCTURE, RELATION, х, AND EXPECTATION OF 
MEAN SQUARES* 


By GEORGE ZYSKIND 
Iowa State University 


SUMMARY. Some properties of balanced population structures are investigated. The forms 
for the expected value of squares of balanced sample means from balanced population structures are 
obtained, and this is done also for the type of sample mean arising in a class of randomized experiments. 
The canonical form of the X expansion of the expected value of the square of the sample mean is 
demonstrated. The form of the expected value of mean squares in the analysis of variance for a large 
class of situations is then derived as a consequence. 


1. INTRODUOTION 


The present paper concerns itself with a rather general approach to certain 
basic aspects of questions of experimental design connected with the technique of 
the analysis of variance. 

The mathematical representation of common experimental designs is generally 
considered to be covered by various special cases of the general linear hypothesis 
theory in its current formulation. Though that theory has been extremely success- 
ful and is very useful, some of its drawbacks have of late been considered sufficiently 
important to warrant an approach not covered by it at present. Thus, Kempthorne 
(1952) explicitly introduces randomization variables in order that the mathematical 
representation related to the designs he considers reflect a one-to-one correspondence 
with the way the experiments are to be carried out. This approach was insisted 
upon in later publications by Kempthorne (1955), Wilk (1955a and 1955b) and Wilk 
and Kempthorne (1955, 1956a, 1956b, 1957). Further, to strengthen the foundations 
of their approach and also to explain their position in connection with the “mixed 
model controversy” these authors decided to use “derived linear” rather than “assumed 
linear” models in all the particular problems considered. One characteristic of 
“derived” models is that their application does not require any assumption concerning 
the form of the response as a function of the values of the factors influencing it. 

In the formulation of a derived statistical model for a partieular experimental 
situation, one first specifies the “population identity.” The population identity ex- 
presses the typical “actual” or “conceptual” response as à sum of ‘population compo- 
nents’ each of which is a relevant linear function of the possible actual or conceptual 
numbers yielded by the experiment. The components are constructed ps as to bear 
a high degree of correspondence with the usual main effect, and interaction terms of 
assumed linear models. They are precisely defined, but all that is required for their 
construction, as indeed for the construction of the identity when balance obtains, 
is the specification of the “population structure" with regard to Se, relationship of 
"nesting" among the set of individual entities envisaged to possibly influence the ob- 


Served response. 
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In the exploration of results obtained in his thesis Wilk (1955a) found that 
introduction of certain well-defined linear functions of the usual components of varia- 
tion, ов, substantially simplified the form of expectations of mean squares in all 
the analysis of variance tables he considered. Wilk denoted these quantities by 
У?в (read as cap sigmas) suggesting that under symmetric conditions extensions of 
the results on expectations of mean squares were implicit in the forms obtained for 
the cases he studied. The paper on non-additivities in a Latin Square design by 
Wilk and Kempthorne (1957), exemplifies the types of У expressions involved. 


In the present paper we discuss the notion of relation and structure in 
experimental design and give, in the notation here introduced, the general definition 
of the X quantities. The main result of the paper concerns the simple general X form 
of expected values of squares of typical observational means involved in analysing 
experiments. The simple X form of expected mean squares in the analysis of variance 
follows then as a direct consequence of the fact that in "balanced" cases each such 
expectation can be written out uniquely and in an easily specified manner as a linoar 
function of expected values of squares of the typical observational means. As the 
contents of the present paper makes clear, the applicability of the above statements 
to physical problems is, indeed, very broad. 


2. POPULATION STRUCTURES 


Consider a response designated by Y. Suppose it is envisaged to depend 
entirely on a finite number of entities, e.g., pressure, temperature, etc., every ono of 
' which is indicated by a corresponding subscript in the notation for the response, where 
the range of these subscripts is over the possible levels of the entity in question. We 
restrict ourselves to situations in which every combination of the levels of subscripts 
is admissible. The physical layout and character of the entities in question are usually 
such as to admit a natural set-up with respect to the relation of hierarchal arrangements 
of the entities. An entity is said to be hierarchal, or nested, within another set of 
entities, S say, if the unique identification of any one of its elements requires also the 
specification of some particular set of elements of the entities S and the entities 5 
are said to nest it. For example, in the structure of the experimental material of the 
randomized block design, the unique identification of a particular plot requires not 
only the plot number but also the specification of the block containing the plot. Thus, 
if we denote a typical possible response in the randomized block by Y;;;; where the 
index $ refers to the block classification, the index j to plot, and the index k to the 
treatment classification then the unique specification of a plot requires always not only 
the particular value of the index j but also the special value of the index i appropriate 
2 the block in which the plot is nested. Symbolically, the structure of the entities 
involved in a typical conceptual response of а randomized block may be expressed 
as (i : j)(k), where the brackets separate the different types of entities and the colon 
indicates that the unit entity, j, is nested in the block entity i. According to our 
notation it would make no sense to write j : i in this example. 
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A more complex structure of the fundamental entities is exemplified by the 
case where the experimental material is stratified into sources, S, with each source 
being cross-classified by rows, E, and columns, C. Suppose further that there are 
two types of treatment factors, say A and B, but that repeated attempts at the same 
level of а treatment are subject to errors in realization. А conceptual response is, 
therefore, due to both the levels aimed at and a deviation from it, or a sublevel. The 
treatment structure then is hierarchal—sublevels within levels. If we denote sub- 
levels of A and B by a and 8, respectively, then a symbolic representation of the 
structure of the entities involved is 

(S: RC) (A:a) (В:0). 

An extension of the above example is furnished by the case where the columns 
within sources are further subdivided into strips, denoted by 2. The rows do not nest 
the strips and hence the complete representation of the structure of the entities involved 
requires sub-brackets within the main bracket for the experimental material. The 
symbolic representation of the situation is 

(S:(R(C:L) (A:a) (В:0). 

Finally, consider the following illustrative example, Four entities are invol- 
ved in the structure. Denote them by P, Q, 8, and R. The set of relations is: Q 
nested in S, and В nested in SP combinations. 5 


Symbolically, (8:0)(Р) and (SP: R). 

One can obtain partial population means by averaging over the entire range 
of values of particular sets of subscripts. Partial means are denoted by the usual 
symbol for a response but with omission of subscripts over which the average has been 
An admissible mean is defined as one in which whenever a nested index 
nest it appear also, Our considerations are res- 
tricted to admissible means only. The indices of an admissible partial mean which 
nest no other indices of that mean are said to constitute the set of indices belonging 
to the rightmost bracket. It is convenient to indicate the grouping of the indices 
of the rightmost bracket by using parenthesis, ( ), and also to group in this way other 
sets of indices when we wish to emphasize that for some structural reason they belong 
to the same category. Thus, in our randomized block case the admissible partial 
means are six in number and may be denoted by: Y, У, Yr Yam Ход» Yoo 


taken. 
appears then all the indices which 


From every partial mean linear combinations of means can be formed which 
are of special physical and formal significance. These linear combinations, hence- 
ained by selecting all those partial means which are 


forth called components, are obti rtial | 
yielded by the mean in qüestion when some, all, or none of its rightmost bracket 
subscripts are omitted in all possible ways. Whenever an odd number of indices is 


omitted the mean is to be preceded by a negative sign, whenever an even иш is 
omitted the mean is to be preceded by a positive sign. The number zero is considered 
For example, in the randomized block the partial mean Yo) leads to the 
— Y, the mean Yogen. fo (Ус Ye Ya-F Y), and the 
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mean Y to (Y). The components thus constructed have a correspondence with the 
effects and interactions of the usual assumed linear models. They also bear a corres- 
pondence to the various terms of a Taylor expansion of well-behaved functions. 

The following facts are immediate consequences of the definitions. 

(1) 1f the rightmost group of the leading term of a component: consists of 
p indices then the number of means present in the component is 22. This follows from 
the fact that the number of means in the component is equal to the total number of 
groups of size zero to p inclusive that can be formed from p different objects. Since 
each object can be dealt with in exactly two ways, it can be included in a group or 
excluded from it, the total number of groups is 27, 

(2) The sum of the coefficients of means in any component is zero excluding 
the coefficients of the component Y for which the sum is one. This is во because coefli- 
cients plus опе or minus one are assigned to partial means according to an "even-odd" 
eriterion. Adding the signed coefficients of the partial means we get 


EB ume 


= (14-а)? for z — —1 
=(1-1?=0forp40 qed. 


We now show that for any given population structure the typical response 
= can be expressed identically as a sum of all its corresponding components. This 
relation is called the population identity. То establish the identity consider any one 
admissible partial mean, The partial mean appears in the expanded form of those 
components for which the subscripts of the leading terms exceed ог are identical with 
the subscripts of the partial mean, and for which the excess subscripts appear only 
in the rightmost bracket of the leading term. It follows that all the qualifying lead- 
ing means can be obtained from the partial mean by adding to its indices none, some, 
or all of the indices of a. particular finite set. Further, since whenever the number 
of excess indices is even the corresponding component contains the partial mean 
` considered with coefficient plus one; and whenever it is odd with coefficient minus 
one we see from the argument developed above that for any partial mean, except the 
_one containing all indices of an individual response, the sum of Из coefficients 
over all the components is zero. For the exceptional term the coefficient is one. 
This completes the proof that the decomposition of an individual conceptual response 
into а sum of components is identical. q.e.d. 


33 an illustration we notice that the term У; in the randomized block case 
appears ipa only in, components whose leading terms have none of, one of, 
or both of j and Ё as an excess over i in their rightmost group. The number of such 
terms is 2?— 4. They are Y; У, Y, ig» Уу), and their corresponding components are 

(Y, —Y), (Y4 — Y, — Y,-- Y), 
(Ио 1), (Yigg Yt o— У-Ү). 
118 


ON STRUCTURE, RELATION, X, AND EXPECTATION OF MEAN SQUARES 


We further note that the sum of the У; over these components is zero, as it 
should be. 


E EU population components can be shown to satisfy some very simple and 
useful relations when the range of any one subscript of the population structure is 
thy same for every particular set of values of the other subscripts. When this condi- 
tion obtains the population structure is said to be balanced. Because of reasons of 
mathematical simplicity and because structures involved in experimental designs 
are generally chosen to be balanced, the population structures discussed in the present 
paper will be taken to satisfy the balance condition. 

We now state a number of basic assertions about balanced population struc- 
tures. Details of proofs which are omitted can be found in Zyskind (1958): 

(1) For any type of component the sum of values of the components is zero 
over the population range of any one index of the rightmost bracket of the leading 
term of the type. 

Thus, the above statement generalizes identities such as У (Y;—Y) = 0. 

i 


(2) Тһе sum of squares of the responses over all values of all indices is 
equal to the sum of squares of all the individual typical components over the ranges 
of these same indices. 

-The above assertion generalizes identities such as 


z Prop 2 Е D THUS (Ув у Y. 


Proof: Denote the value of a component by a capital letter and. subseripts, 
using identical subseripts with those of the leading term of the component. The 
product of the values of two individual components of different types involves an 
index which is in the rightmost bracket of one of the components but not at all among 
the indices of the other component. By assertion one the sum of products of these 
two types of components over such an index is zero. Hence, the sum of products 
of components of different types over all the indices of the population is zero. The 
validity of the assertion follows as a consequence. q.e.d. 

(3) The total number of linearly independent components is exactly № where 
N is the total number of possibly different responses. 


(4) The number of linearly independent values of a given type of component 


is equal to the product of the population ranges of indices of the component, which 
duct of the diminished ranges of 


do not belong to its rightmost bracket times the pro 
the indices of the rightmost bracket. (By the range here is meant the number of distinct 
values taken on by the index in question; the diminished range equals the range minus 
one). Further, the sum of these numbers over the components of different types is 


equal to М. 
t the sum of squares of values of the compo- 


(B) For every type of componen 
f the rightmost bracket is equal to the sum 


nent over the ranges of all the indices о : 
over the same set of indices of a linear function of squares of partial means making 
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up the component, with the coefficients of the squares being the same as those defin- 
ing the component in terms of corresponding partial means. 


The above assertion generalizes identities such as 
E(Yaogo —Yy— Yet Y) = У (Yoga — Y$— Yht Y 
jk j 


for all values of i. 


Proof: For a particular type of component consider the sum of squares of 
its values over all the indices of the rightmost bracket. If there are р indices of the 
rightmost bracket, then there are 2? subsets of these; these subsets are in one-to-one 
correspondence with the partial means appearing in the initial and final expressions. 
То establish the identity one first expands the square and puts the resulting square 
and cross-product terms into 2” categories, the category being specified by the set of 
rightniost bracket indices which the two factors of the product have in common. 
Such sets will be called ‘intersection sets.’ Further, for each product we define the 
‘excess set’ to consist of those indices which appear in exactly one of the two factors. 
The coefficient of any product is then (—1)! where i is the number of indices in its 
excess set. Because of balance, when any product is summed over its excess set one 
obtains the square of the partial mean corresponding to the intersection set times the 
product of the ranges of the excess indices. The product of ranges can then be replaced 
by summation over the excess indices thus restoring the summation over all rightmost 

· bracket indices. It remains to combine like terms. Let p, q, i denote respectively 
the number of indices in the rightmost bracket, the intersection set, the excess set. 
For fixed p, q one has i = 0, 1,..., p—q. АП possible combinations of excess indices 
will appear in the expansion for each i, these аге (274) in number. Further, for any 
particular combination the i indices may be divided between the factors in 2' ways. 
Thus for q indices in the intersection set, i.e., q indices in the partial mean, the coeffi- 
cient of the squared partial mean is 


(14-z)?-*, for t ——2 
zm (аи = (еч 
Зевса = 
уы l if p—q is even, 
and L —1 if p—g is odd. 
Thus, the coefficient of the square of the partial mean is the same as the 
coefficient of that mean in the definition of the component. q.e.d. 
Corollary: The type of identity specified by assertion 5 is valid when 
summation takes place over the ranges of all the indices of a typical response. 
Exploitation of identities specified by the corollary forms the basis of our 
approach to finding expected values of mean squares in the analysis of variance table, 
Definition : The number of linearly independent values of components of 
a type is said to be the number of degrees of freedom of the type of component. 
Also, the number of linearly independent possible responses is said to be the number 
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of degrees of freedom of the set of possible responses. We shall often abbreviate the 
expression degrees of freedom by writting D.F. 


For a given type of component the number of. degrees of freedom is as stated 
in assertion 4. s 

Definition : The sum of squares of values of components of a given type 
over all the population ranges of the indices used to denote the component divided 
by the number of degrees of freedom of the type is said to be the component of varia- 
tion of the type of component. 

Henceforth we denote particular components of variation by o?'s with 
subscripts, bracketed into groups, corresponding to the subscripts of the types of 
components to which the o’s refer. 

We now define the X's. 

Definition: Consider a particular type of component and all c?'s of the fol- 
lowing form 

(i) the set of subscripts of c? includes the set of subscripts corresponding 
to the leading term of the component as a subset. 

(ii) the excess subscripts belong exclusively to the rightmost bracket of о?. 

The linear combination of all such o?'s, where the coefficient of a particular 
а? with k excess subscripts is 

1 
COMO product of population ranges of the excess indices 
is defined as the X corresponding to the type of component under consideration. 
The subscript notation for the X is to be the same as for the type of component. 

It should be pointed out that thé component of variation corresponding 
to the null set is Y? = 0$, and that the corresponding Жо) is uniquely defined. 
The introduction of Si is of prime importance to the development of the present 


approach. 
To fix ideas consider randomized blocks with B blocks each of P plots, and 


with T treatments. The X's are as follows 


1 1 1 
Уф =% -pT T ain + БТ TED 


: 1 1 1 
Xu, = 0% — p вт p eM + рр ayer) 


1 
Lin =n — gin 


1 
Lar) = (вт) р О(ВХРТ) 


1 0% 
Eise = (ВХР) —qp (ВРТ) 


(вирт) = IBPT 
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As another example we may verify that for the structure specified by 
(S : Q(P) and (SP : R), 
where the respective population ranges are S, P, Q and R, the set of X’s is 


1 1 1 
Eg; = Yè- 5 oy — р 9 + gp asp) 


1 1 1 
Xs) = 01$ — pon Ug суо + РО C(syPQ) 

af 
Xy = 0) — 8 isp) 

2, 1 g 1 0% 1 02 
Lisp) = 05 -~Q "eo — R (sry + QR "spo 
1 

У вуд Vg — р ToP) 


1 
(вурду = Misra — p SPx) 


1 
Lispxr) = O(spyR) — Q зо 


У(врРуву = C(sPxon + 


The analysis of variance, introduced by В. A. Fisher їп 1918, is à technique 
which lays out in tabular form the breakdown of the total sum of squares into separate 
parts, each of which can usually be given a supposedly physical meaning in the sense 
that it describes an assignable source (or complex of sources) of variation. 


Corresponding to any problem to which the application of the analysis of 
variance technique is appropriate there exist at least two major types of analyses of 
variance—one for the population and one for the observed sample. For the purposes 
of this paper we define the population analysis of variance to be the tabular partition- 
ing of the total sum of squares and degrees of freedom of all the responses of the popu- 
lation into parts, each part corresponding to one type of component. The validity of 
assertions 2 and 4 ensures that construction of such a table is possible for every balanced 
. population structure. 


For each part we shall also exhibit the quotient of the sum of squares by the 
degrees of freedom, and we shall call the result the mean square of the part. Since 
the different quantities associated with a part are usually exhibited in an orderly fashion 
in a single line, we shall use the terms part and line interchangeably. 

As an immediate consequence of our definitions we see that for any particular 
line: 

Mean Square = (product of population ranges of indices not involved in the type 
of component corresponding to the line)x 
У (component)?/number of degrees of freedom 
indices of 
component 


—(number of individual responses entering into a typical leading mean 
of the component) X G2, ponent 
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3. BALANCED SAMPLE MEANS FROM BALANCED POPULATION STRUCTURES 


In the mathematical models for the experimental designs we consider we 
take care that the models reflect a one-to-one correspondence with both the initial 
population structure and the physical way in which the experiment is to be performed. 
The use of design and sampling random variables, which we shall illustrate for specific 
instances later, has proved extremely valuable to this end. Since in our formulation 
we use derived or definitional population models our method in its initial general 
formulation does not depend on any special assumptions. ‘Thus, it has the advantage 
of allowing us to introduce simplifying assumptions only as they are needed, and so 
to explore in detail just how far a minimal set of assumptions will carry us. Finally, 
because in the present approach we always initially consider finite population struc- 
tures, fixed, random, and mixed situations come out as particular and usually simple 
cases of the general formulation. 


The details of the mathematical procedure we employ are as follows. ‘The 
conceptual counterpart of the samplings and random assignments of chosen entities 
involved in the carrying out of the various experimental schemes can be obtained by 
conceiving of carrying out the similar operations in the population of index values 
of all the possible responses. This is accomplished by the explicit use of the sampling 
and design random variables. "The statistical model for a sample observation is then 
one indicating explicitly the physical process by which the given Observation was or 
is to be selected from the set of population values. From the statistical model for an 
arbitrary experimental observation it is easy to see that every partial sample mean 
is expressible as a sum of sample means of possibly ‘different types from the various 
population components. Thus, special groupings of sample observations induce 
particular types of samples from the population components. 


An example illustrating the situation is that of the simple one-fold hierarchal 


or nested population. Here the typical population observation, Y;;, is expressed 


identically in terms of components as follows 
Y; = Y4(Yi— Y)H(Yogo- Yi) 


Any sample observation or any mean of sample observations will involve clearly а 
sample of ће (Y;— Y)'s and а sample of the (У Yi) 5 The particular samples 
of the (Y;—Y)s and (Ya — Ys involved depend on the sample of the Уз 


actually chosen and are said to be induced by it. 


ple observation by the symbol z where the 


We shall denote a particular sam: 
subscripts of 2 indicate the sampling orders, in terms of the various population classi- 
ue was obtained. For reasons of mathematical 


fications, in which the sample val | 
difficulty and also primary interest, we restrict our attention in the remainder of the 
paper to types of samples which will henceforth be called balanced. 

aid to be balanced with respect to all subscripts 


Definition: A sample is 8 
bitrary sample observation if the sample range of 


used in the representation of an ат 
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any one of the subscripts is the same for every set of particular values the other sub- 
Scripts may assume. 


In what follows we exhibit in order of increasing complexity of structure 
and derivation results on the expectation of squares of sample means. We also define, 
corresponding to any balanced sampling scheme, the sample analysis of variance. 
Making use of the fact that the mean square of any line of the table can be written 
as à known linear function of squares of sample means we state and prove, in both 
the c? and X languages, the general theorem about the simple form of the expected 
value of the mean square of any line. 


Consider a random sample of size n from the population of N elements whose 
values are denoted by Y; i = 1,2,..., №. Denote the population mean of the Y;'s 
by Y. Denote also by xj, i* = 1, 2, ..., n, the value of z;.-th observed sample member 
in order of selection. Define now nN random variables as follows 

од" = 1 if i*-th chosen item in the sample is the i-th item in the population. 


oi = 0 otherwise. 


Each of the selection variables has the following simple properties 


Pat =1)= 1; Par-0-1— y) 


N 
Ж”) = Во" = 1, for all i 
N 
Eoi ois) cda Best fon ii, а. 
NC UNS? ? 


Here, and in the remainder of the paper the symbol P is used аз a shorthand for the 
word probability and the symbol Ё denotes the expectation operator. 


In terms of the elementary random variables the value of any z; may be 
written 


Remum а, ОЕ г [rea] 
as ime e УК ает А, i )| 
х 
=н- X a А; 
i-1 


X 
ЖУ ы 
where и = Y = EX = population mean, А, = Y,—Y. 


DM x 
Hence Из, = p +5-% (У-У) = p, since E (YY) =0. 
$ i TNR 
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Further, E аў, = ЕЧ Зи) = УВ af (Y -Yj Y! + y Ў (y,—-Yp 
i ТАКЕ, 


(У)? ) 


= (=)? a 
( 4 (А) ( where 04) z on 
1 Г 
T ( Tum ga) ety = Xy + Ха». 
It follows that (ж) = E.) — (Ele) = (a Bl oy. 


Further, а= Ер Ў ж, = и + 1 $ 5 oV A; . 
n ie A emm 
Hence, employing the properties of the sampling random variables we сап show 
easily that 
тй 


0-8). 


т 
; 2 
‚ We notice that the variance of the sample mean =p p = ( 1 =) gay 
н т 
This, of course, is a well-known finite sampling elementary result. We focus atten- 
tion, however, on the facts that the square of a sample mean of size n from the popu- 
lation of the As has expectation 


[= g)de nae seme 5) e pete a 


We have exemplified the use of sampling variables in the above simple illustration 
because these and also design random variables are used extensively in the paper 
in the formulation of derived linear models and also in the arguments of several of the 


underlying proofs. 


The result Ha? = p? + (e 2 5 = Ж + Par admits convenient ana- 
consider elements with values Ру arranged in а 


logues in more dimensions. Thus, 
$=10,... А andj = 1, 2, ..., В; with the 


two-way table of A rows and B columns; 
values P; subject to the condition: 


A sample of elements is now chosen as follows. А random selection of a out of A 
rows is made, and independently a random selection of b columns out of B; the elements 


of the sample are the elements of the intersections of the chosen rows and columns. 
The 


A sample drawn according to the above procedure is said to be a cross sample. 
term “bisampling” was used by Cornfield and Tukey (1956) but we do not favour it 
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because its use would necessitate terms like “trisampling” and so on. A schematic 
representation of such a sample is given in Figure 1 below. 


entity B 
AS TETTE 
бө 9| 
entity А | 
0; 0 0 
SUE 0| 0 0 | 
Р | 


Fig. 1: А cross sample from a cross 
two-dimensional population. 
The physical situation can again be described by introducing sampling random 
variables. Let 
ai = 1 if the i*-th choice from among the #'з selects all P;;’s having the 
subscript $, Ў 
and оі" = 0 otherwise. 
j* — 1 if the j*-th choice from among the j's selects all P;'s having the 
subscript j. 
and £} = 0 otherwise, 
An observed sample value, denoted by 2;.;., is then expressed as a function 
of the P;;’s as 
- 5s is pie 
Viaje = D BY Ps. 


The expected value of the square of the sample mean, =, can then easily be 
shown to be 


2 
о (2) (r 5) 
XP} 
Е. 
(4—1(8—1) 


Thus, a multiplicative correction factor of the form shown above appears correspond- 
ing to every classification involved, and о в is to be divided by the total sample size. 


where — oig 


А B 
The condition р. Pg = У Р, = 0 із one which is satisfied by the compo- 
jel 
nent types of balanced population structures. It is not an assumed property. 


Tt should be noticed that whenever complete selection of at least one of the 
classifications is made, then the general formula yields the value zero for Ea?. ‘This 
isin agreement with the fact that under the above condition = is equal to zero 
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i dent The condition is seen to correspond to thesituation when the classification 
in question is termed to be fixed in common statistical terminology. 

The connection aS the usual two-way balanced classification is as follows. . 
Let Yy, t= 1, 2,...,4; j= 2; - В, be the typical response of the ¿j-th cell. 
Then Y; admits an identical decomposition 

Y; = Y-H(Y,—Y)-(Y,— Y)+(¥;-Y,-Y,+Y) = A At B; AB, 

If the sample is drawn as described above then the i*j*-th observed sample 
value is related to the У, ?в and to the population components by 


Viaje = MD ag mis Ay уы ai Bj (АВ), 


sons (1-1) ee (a) dee p 


a 


T Xo 54 +5 Xa un 
J 

$4 Я (үүр 
il C lode ) 
А—1 A—1 

By introducing a little more notation 16 can be shown without much trouble, 
аз is done by Zyskind (1958), that the basic form of the above results remains valid 
regardless of the number of crossed classifications involved. The fact that the X 
forms follow as a consequence of the g? forms will be shown here later for situations 
of which the present one is a particular case. 

Consider next a one-fold nested population of AB ordered items arranged 
into A groups of B items each, Denote the value of the j-th item of the i-th group 


by (= Yi). 
We have identically S 
У; = Y4(Yi- Y) (Yay Y) = (t Ai Big, 


whero for example, оу = 


where for example Ву = у Y; 
Since the population is a balanced one 

b д; =0; Ў Ву = 0. for all values of i. 

i=l 1-1 
Let the sampling procedure consist of choosing randomly a out of 4 groups 
groups selected choosing randomly b out of the В elements. 
o E * 
* th chosen element in the ;*-th chosen group. 
values of the population items by defining 
ВІЙ", where for example 


and in each of the 
Let 2; denote the value of the j 
We link the variable зу» (= Tisje) to the 


. it 
elementary selection random variables of the type oi. 
1 if the j*-th chosen unit of i*-th chosen group is the 


pii = j-th unit of that group 
0 otherwise, 
fH bs FHL 28 on B 
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It follows now that 
Vj. = 7 ot ps У; == 5 оќ" В @ (и-ЕА;-+ В) 


ыза HALE аў АЎ Во. 


By making use of the БК of the random variables aj*, Jj; , we can 
easily verify that 


1 1 = L 
Еа» = и ( bn ) Put ( DIEN ) Fane = Xi Xu Eco 


Also, Ва? = в (5 Lee ) 2e (1 


e ) ay +(1- b x Taya 
a 


a 
A B ab 
Жу ees tos 

=) a (A) ab CAY(B)* 


The generalizations of the formulas exhibited so far can without excessive 
trouble be shown to be valid. The detailed proofs are given by Zyskind (1958) and 
are here omitted for reasons of economy of space. 


Definition: The finite correction factor of size j due to the index i is tho 


VOU dS Aa Ee i j 
number (1 а m and is denoted: by fi. 


We now summarize the bulk of our conclusions thus far. 


"Theorem 1: The expectation of the square of any admissible partial mean, arising 
from а balanced sample of observations, is equal to a linear function of all the different 
components of variation of thé population. The coefficient of each component of variation 
is the product of the finite correction factors, one factor for each index of the rightmost 
bracket of the component and each оў the form је where ог is the number of different values 
of the index i entering into the partial mean, divided by the number of different values of 
the component entering into the formation of the partial mean in question. 


We notice that relative to a partial observational mean the sample size of 
index i is one whenever the index corresponding to i appears in the partial mean. 
This leads to the rule that the number of different components of a type entering into 
the formation of a mean is equal to the product of sample sizes of indices, which the 
type of component has in excess over those of the admissible partial mean. Also, 

1 
_ range of index i 
partial sample mean and in the rightmost bracket of the component of variation; 
if the indices corresponding to i appear in the rightmost bracket of the component of 


ES if the indices corresponding to i appear in both the 


variation but not in the partial mean then TESI са ) : 
А У population range of i 


We now seek to express the result of the above theorem in X form. 
Consider the term involving a particular o?. Let X be the set of the non- 
righ tmost subscripts of о? and let Y be the set of rightmost subscripts. Let Z be an 
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arbitrary set of subscripts such that Z c Y. Denote the number of subscripts in 
Z by q and for q = 0 define the produet of the ranges of the subscripts in Z to be 
one. Let (узу denote the number of different components, whose type is specified 
by the set of indices X--(Y—Z), entering into the formation of the partial sample 
mean. We carry out the argument in terms of the overall sample mean since all results 
for partial sample means ean then be obtained as special cases. 


The obvious expanded form of the term involving o?,yy) is 


ci UE 


о? > 
OMY) гү П (population range of i) Муңу. 
te. 


Let R be some fixed specified set of subscripts. In the completely expanded form 
of Ea? collect all terms for which о? has the same X and for which Y—Z = R, ie. 
vary Y and Z subject to the restriction that Y—Z = №. The sum of all such terms is 


M Cay Se 
Nasr Еу, сори П (population range of i) Ny,p 


Eoo; 


by definition of the Xs. 
Hence 


Ext = У ( Zuo ) = sum over all admissible pairs of X and R of Уро), 
xR \ Хув Nxir 


We notice that Мук = оң, where a; is the sample range of the index 


П 
4Х-+В 
i in the complete sample, and о; = 1 when the index ranges over the null set. 

Lot S, be the set of indices of any particular admissible partial sample mean. 
Then, relative to that mean, the number of different components whose type is speci- 


fied by the set of indices S, = X--R is М = па " оц, and we see that an alternative 
7253—81 


statement of Theorem 1 is 


s - (3) 


1 ü 
159. —&1 


There should be no confusion generated by the use of X for summation and for 
denoting ‘cap sigmas’. 

We have already noticed that for balanced samples in which inspection of à 
population value is attained through the use of a finite number of sampling stages 
only, the subscripts of a typical sample value and of a typical ров correspond. 
The structure of the sample of obseryations is therefore identical with the structure 


of the population and hence there is a unique identity for the sample typical Sheer 
ntity for the typical response. Sample com- 


vation analogous to the population ide * " 
ponents and degrees of freedom are defined analogously to the corresponding рор 
abbreviated 


lation quantities. Further, we define the sample analysis of variance ( 
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occasionally by ANOVA) according to type of component to be the tabular break- 
down of the total sum of squares and degrees of freedom of the sample observations 
corresponding line for line to the population analysis of variance. The possibility of 
such a decomposition for every balanced sample is ensured by the existence of the 
sample identity and hence by the validity for the sample structure of the assertions 
made earlier about population structures. 


For each line of the sample analysis of variance we wish to include the expres- 
sion for the expected value of the mean square of the line. We shall make а corres- 
pondence between the indices used to denote sample quantities and the subscripts 
of all the о? and У’з i.e. for the purpose of what follows we envisage identification 
of corresponding sample and population quantities by the same set of subscripts. Lot 
S denote the complete set of subscripts used in writing all the o®’s and Xs, S, denote 
the set of subscripts associated with the leading mean of a particular line, and 5, 
denote the set of subscripts associated with a particular о? and the corresponding 
X. Further, let S, = X,+Y, and S, = X,+Y, where we denote by X's the respec- 
tive sets of non-rightmost bracket subscripts and by. Ys the sets of rightmost bracket 
subscripts. Let W = Y,— Y, (^) S, and let æ; and А; denote respectively the sample 
(complete sample) and population ranges of the subscript i. Both these ranges are 
defined to be equal to one when the subscript i ranges over the null set. 


We now state and prove the following theorem for arbitrary sample and 
population ranges. 


Theorem 2 : The expected value of the mean square of the line for which the 
set of subscripts for the leading mean is S, has the form 


X R(S, S. = X 3 
8,58 (5, 2)03, 8,58 P(S;, 5.) Es, 


where the R’s and P's are constants with values as follows 
(i) .R(S,, S2) = P(S,) = 0 if and only if S, does not contain 81. 
(ii) Whenever S, C S, then we may write 


P(8,, 85) = P(S) and R(S,, 5.) = P(S))x QW), where P(S) = П о; 
5—53 

= number оў times any опе component whose type is specified by Ss 

enters into the complete sample used in the investigation and 


QUY) = п un EA (1 sample range of index i : ). 
ieW population range of index i 


Corollaries : E The mean squares of the population analysis of variance may 
be obtained, in either о? or У forms, from the expected mean squares of the sample 
analysis of variance by replacing all sample quantities by the corresponding population 


quantities. Note that then the coefficients of all т?з vanish except the о? for which 
Sy S, { 
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(2) If for a particular o? appearing in the general expression for the expected 
value of the mean square of a given line, at least one of the excess subscripts of a right- 
most bracket, i.e. at least one ieW, has its sample range equal to its population range, 
when the classification corresponding to the index is what is commonly called “fixed”, 
then the contribution of that o? vanishes from the expectation when the specific sample 
sizes are taken into consideration. 


(3) The actual value of the finite correction factor corresponding to an index 
whose population range is infinite is equal to one under all circumstances. 

Proof of Theorem: The following proof makes use of Theorem 1 and its 
alternative statement and of the structure of lines in the AN' OVA table. 


Consideration of the scheme employed to denote sample observations and of 
assertion 5 shows that the expectation of the sum of squares for any line can be written 
as a product of the number of observations in the experiment and the expectation. p 
of a known linear function of squares of typical partial sample means. The form ” 
both the function and the means is uniquely determined by the form of the leadin, 
mean of the line. 


We begin by considering the contents of the expectation of the linear function 
with respect to a particular component of variation. 


Suppose S,is not in S, Then the leading partial mean has at least one 
subscript, say i, appearing in its rightmost bracket but not appearing at all the compo- 
nent of variation. For every term containing the subscript i in the expanded form 
of the sample component defined by the leading mean, there is one of opposite sign 
and with subscripts identical with that of the other term but with $ absent. Since 
the contents of the component of variation considered, and of the corresponding 
X, are identical in the expectation of squares of both these means it follows that the 
coefficient of these о? and J is zero in the overall expectation oftheline. This completes 
the proof of part (i) of the theorem. 

Tt remains to determine the values of the R, P and Q for the case when SCS 
Though the direct derivation in terms of the с?з is instructive, we shall argue in terms 
of the simpler X forms. 

Let Z be an arbitrary subset of Y; 
denoted by 49. 


We first find the coefficient of > s. 


and let the number of subscripts in Z be 


in the expected value of the sum of squares 
2 
of the line for Sı, Consideration of the suggested form of the expected value of the 
sum of squares shows that coefficient to be : 


| E: у 
| on) | р i | (rer oe ) 3 E TE Aaa A 


ies Meee 
3852—01 
= i H —1 exc oL и П % П (%—1). 
m м (un ) [aad x ( eX а Т 
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Hence, the coefficient of X, in the expected value of the mean square of the 
2 


line for S, is Р(5,) = П а; = number of times any one component whose type is 
$29 — 53 


specified by S; enters into the complete sample. 
We now wish to find the coefficient of оў, = (ху): From the definition 
of X’s we know that T x.) appears in those and only those У,’ for which 


82—68; = LC Y, But in the present instance S, and all S/'s of relevance contain 
S, as a subset, so S,—S; contain no subscript of S,, i.e. L;()$,-— 0. Also, since 
S, С S, and because of the way in which permissible partial means are defined every 
subseript common to Y, and S, must appear in Уу. Hence, the actual restriction on 
Lis: LCY,—-Y-W. 


The coefficient of X; is П a, П Cy о, П oj). 
вв; = ie($—8,)--(S4—8j) ` = (asia, ) вх. ) 


Let the number of subscripts in L,beq;. Then the coefficient of о, is seen to be 


(um «jx. x (п) ) 


tess, | Lj&Y,—Y; \keLj Ar 
E (а, а) (asl. y, (= 4, )) 
= P(8, xQ(W). 


This completes the proof of the theorem. It should be noted that the с? form 
of the theorem is essentially known and that equivalent forms are given by Bennett 
` and Franklin (1954), and also Cornfield and Tukey (1956). 


The merit of the present formulation lies in pointing out the simple form 
of the expected values of squares of sample means and how advantage of this may 
be taken when the structure of the lines of the sample analysis of variance table is 


defined precisely. Some tables exemplifying the application of the present theorem 
will be given later. 


= 4. RANDOMIZED EXPERIMEN TS 


ig Up to this point we have considered the consequences of drawing pure samples, 
і.е. combinations of crossed and nested samples only. However, when as is done in actual 
designed experimental investigations, the physical act of randomization is also employed 
yet another type of sample is induced in some of the components. Thus, if a response, 
Yip, is determined by the treatment i and the unit k to which the treatment is applied, 
then in the sample the unit Ё can only be used once. A sample which is obtained 
by randomly applying the chosen treatments to the chosen experimental units, with 
7 applications of each chosen treatment will then have a structure as indicated in 
Figure 2 below, where Т and Р correspond to the treatment and unit entities 
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respectively. A sample drawn according to such a scheme is said to be a simple 
fractionated sample. ) 


entity P 
0 bes ЕКЕ ЕИ Т а 
| eo bbs MS ess 
— acm 


» 0| 0 entity T 


| 


035 0| 0 
Fig. 2. A simple fractionated sample. 


A sample of this type is induced in the treatment-experimental unit type of compo- 
nent, (TP),. If the randomly chosen treatments are t in number out of 7, and 
the randomly chosen units are rt out of P, then the expected value of the square of 
the overall induced sample mean, 2, of the elements (ТР), is given by 


= ңү рой 
ва = [(1-у-)-р(!-т)]ш” 
X (ТР, 
Е у 
йт) = TIP) 
Again the form of this result has the immediate generalization in higher 


dimensions. Thus, if the treatments, denoted by A and B respectively, have a fac- 
torial structure then the generalization of the above formula is 


ra| (1-1) 6-3) 50-3 0-2) 55 
mly out of A, and similarly and independently 


Here a levels of A are chosen rando REA ; 
b levels of В out of B; also each chosen treatment combination is applied at random 
where altogether rab units are randomly chosen out 


where 


to exactly r experimental units, 
of P and no unit can be used over in the experiment. : | 

1 should be noted that the second part of the correction factor bears a simi- 
larity to the type of correction factor obtained in the crossed. case. Further, while 


the second part may for appropriate sample sizes become zero, this will never be 


the case with the first part of the correction factor. 


We now formalize and prove our statements for the case of the two dimen- 


sional simple fractionated sample. 
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Consider a set of AP items with values denoted by P;,, where i = 1, 2, ..., А; 
., P; and such that 


A P 
Dp. Pis. 
4=1 uel 


Partition the P;,'s into sets having like #’s and choose at random a such sets. Within 
each chosen set select a random subset of size r but with the restriction that no chosen 


Риз may have like values of ш. 
cated in Figure 3. Introduce random variables oj", 


T2511, 9,549 u= 1,2, ..., P, and satisfying 


Then 


КОН, — 102i 06 


Pictorially our sample will have the structure indi- 
pel with i* = 1,2, ..., a; 


af = 1 if the i*-th choice from among the i's selects all Pj,’s having sub- 


script $, 


ой" = 0 otherwise. 
pi = 1 if f-th selection within i*-th selected set corresponds to a Piu value 
with second subscript = wu, 
pi! = 0 otherwise, 


(ойе)? — Elat -4 АЕ MO * ui" 
(o3) Eo) Elak od) iuc" У Я, 0% 35$ 
Bk) = zi И) = ud е 

(pit) El pit!) = ЮР /=/ 

Bli! pet) ;1* zc i*', и Aw’, and all values of f and f". 


iss 1) 


Е the (i*f)-th selected value by a. Then 
ще; = z оќ“ pil Р; iw 


Theorem 3 : 


procedure specified above is 


Proof : 


E + 


Bat = E (> 


ly X 
\ж ы' ч” 


Деко, 


pi pit” 


D x 


ari 


7) -EXA à (of) 3- 


а? NOR ix 


Р.Р.) = E (= : 


2 Pau Pig 
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1 2 1 x PS 
Hence yo ero EXIT Jee Sr Z 
es 2%) 2 gei Вох) (P—1)r (- г. 


1 
pics qun i 
M T ›( PEI) ‘« 
x P : 
iuge ( ae HET = Piu 
ad (Бау \ РЈ ad(A—1) PPI) 


EI 


cáp 1 T а 
Ие. 

As mentioned before the generalization which suggests itself by this two- 
dimensional case holds also when the number of dimensions is R+-1, R > 1. Proof 
of this may be developed along the above lines and making use of the method of induc- 
tion. The details are given by Zyskind (1958). 

Again, as in the case of balanced, crossed and nested samples, the correction 
factors involved in simple fractionated samples involve only letters corresponding 
to subscripts of the rightmost bracket of the component in question. For example, 
in a randomized block the plots, P, are nested in the blocks, B, so letters of the right- 
most bracket of the interaction of the treatments with plots within blocks are 7 and P. 
The expected value of the square of the overall mean induced in this interaction is 


B» oem eS ES i ОСВУРТ) 

[0-2) = р 6-2) r^ 

where b is the number of randomly chosen blocks out of B and 
a (РТ) 

(Pn) — -gp-i(r-i). 


illustrate by means of the randomized block design in some 


: XP, Р) 


40) 


We now pause to 
degree of detail the nature of our results this far. 
As stated before the population structure for the randomized block is 
(В:Р)(Т) ог (і : (hk). 


The population identity for the typical response Y therefore is 


Yin = Y4(Yc- Y)(Y— Y)H Ya Yi- Yer Yo) 
+(Yy— Yd Yin Yu- Yat я) 
= wt BA Det (BD at P, iy (PP) 1 
The experimental procedure consists of choosing randomly 6 blocks out of B, and in 


each chosen block rt experimental units out of P; of selecting ¢ treatments out of T; 
and within each chosen block applying at random each of the # selected treatments 
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to exactly r experimental units. Denote by д. the observed value associated with 
the f-th selected repetition of the k*-th chosen treatment in the i*-th selected block. 
The sample observational structure relative to this notation therefore is (i*k*:f). 
This implies the sample observational identity 

Biggs f= (и, —2) + (а. —X) + (ge go — e — tpe +X) + (eee) tiore) 
and the corresponding sample analysis of variance. 


In order to link the observed values to the population quantities we introduce 
sets of sample and design random variables as follows: 


ай"), (45), (BE), AF), and (043°), where р = E LL А; 


where again the meaning of the separate variables should be dius from the context, 
and their elementary TG can be easily derived. The statistical model is 


деру = at BE А АЙ Үш = X ар AE pij У 
= + ue аќ Dira BE Trt i ot ВЕ (ВТ) 
+ n o pi Pit. 2 оў pi BE (ТР). 
By making use directly of the properties of the random variables involved 


or otherwise by observing the types of samples that are induced in the various ty pes 
of population components we can easily see that 


ИННЫ 


` +(—р) в +((— 2)- z (1- £)) toe. 


On regrouping terms we obtain 
1 1 1 
paca 
Ел? = б B 79 — т °® + gg т) 
1 1 1 
tg GA — m Cn) — pm + ру jj Ten) 


F З (9% ux в) БЕ г (083—3 Tp Tinen) 


1 
+ ot (ВР) — zoent 3 D врт): 


On insertion of X terms the above expression assumes the simple form : 


Ex? = 


Ш 1 1 1 
Ej +4 b Xa + Ü Eq, + m Lun + hi (боо + zmen): 
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| As might have been anticipated, the expression for Ez? in terms of the Хз 
reti the form it had in the case of ordinary balanced samples only. "The additional 
operation of random confounding has not altered the form of the X expansion for Ha? 
As before the coefficient of each X is the reciprocal of the number of possibly different 
values of the type of component in question entering into the formation of the overall 
sample mean. 


The applicability of the form of the above expression is very general indeed. 
It has been verified to hold in the generalized case of the completely randomized, 
randomized block, latin square, and various modifications of the split-plot designs. 
It has also been shown to hold for the general case of the balanced incomplete block 
design and for a fair number of detailed situations in which treatments are subject 


to error. 


Expected values for partial observational means can easily be obtained from 
the above when the simple expansions for Ea? are valid by simple substitution. 
Thus, in the case of the randomized block if а, is the observed mean of the i*-th 
chosen block, then 222, is obtained from the expression for Ea? by substituting the 
value b — 1. "Thus, 


Za y Zon | вит 4 Zaen , 
t rt 


Ext, = Lp HE 
i (@) + Xa» + i v 


The actual simplicity of the above form may perhaps better be put in evidence 


Jete analogy with the g? forms obtained in taking expectations 


by pointing out its comp! 
of squares of partial means in а two-way table, where all the components, apart from 


the overall mean, are completely random and uncorrelated. 
Thus, consider the model 
у= p+a,+b;+(a);; i=l, 2.0; ў=1,2,..› b 
where p is a constant, the а; are à random sample of size а and був a sample of size 


b, the (ab),’s a sample of size ab, all samples are independent, and from infinite popu- - 
lations with means zero and finite variances. Denote u? by o2 = оў. 


var (a) = 05, var (b) — oj, var(ab);) = оўу. 


Тһеп Еу = 9--o1--00-- 0. 
b 2 2 с? 
кй) = E (5 2%) =й г r$. 
а o 
ВВ) =4+,+%+ 4, 
1 ds oi. oh | 0% 
кл) = E (55 wo) Е а, 
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ie. the contents of о?, i = 0, а, b, ab; in the expectation of the square of a particular 


sample mean is equal to oł divided by the number of possibly different i elements 
entering into the particular sample mean. With proper choice of sample indices 
the analogy can also be carried forcefully to the corresponding analysis of variance 
tables of expected mean squares, since for orthogonal sample analyses of variance 
each such expected mean square can be written out simply and uniquely as a known 
linear function of expected values of squares of partial observational means. 


We now formalize our results. Let Уууу denote the X corresponding to 
the type of population component for which the set of letters belonging to the right- 
most bracket is E and for which X is the set of letters not belonging to the rightmost 
bracket. Let (худ) denote the number of values of component type (X)(R) enter- 
ing into the formation of the complete experimental sample. Then Æg? is said to 
admit the standard У expansion if 

Ез = Paw, 
XR М 

Denote the subscripts of some particular admissible sample partial mean by 
Si and by Ny, s; the number of possibly different values of components of ty po 
(X)(R) entering into the formation of ge. Then if Hx? admits the standard X ex 


pansion it follows that = ы 
p 
ES.— У Qu) | 
81 ха Nix, st 


A sufficient condition for the standard X form of Ez? to hold is that all of 
the balanced samples involved be of the cross, nested, or simple fractionated type. 
A more relaxed sufficient condition may be obtained by realizing that coefficients, 
preceding o's which arise in the above situations satisfy also the condition exempli- 
fied by the following illustration. 


The full term involving о? рр) in the т? expression for Ea? in the case of the 


randomized block is 
т) (0) ) ав. 


It сап also be written аз 


1 о? 1 o 1 о? 
+ y%@men_ 1 у pp _ (xen у 1 Fer 
BEL ww T ея РГ 


Tf we denote by Z the variable set of letters in the denominator of the expressions of 
which овурру is the numerator above, and if we denote by N(g(pp.7;, the number 
of values of components of type (B)(PT— Z) entering into the formation of the observed 
experimental sample mean which induced the sample in question in the component 


(B)(PT), then we see that every term above is of the form 
(=1) с? 

а ше (ВРТ) ; 

Newer- 
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where q is the number of letters in Z. The nature of the above form is one which 
obtains under a very wide range of experimental conditions, Its applicability forms 
a sufficient condition for standard X expansions of the expected values of squares 
of partial sample means. 

Formalizing then we now establish the following statement. 

Theorem 4: Consider the situations where : 

(1) The sample means induced by the experimental procedure in the. different 
types of population components are pairwise uncorrelated. Hence, the expected value of 
the square of an admissible observed sample mean is equal to the sum of the expected 
values of squares of sample means from the individual types of components. 

(2) The induced samples are such that for each type of component the expected 
value of the square of an induced mean is equal to the corresponding component of varia- 
tion divided by the number of values in the mean and multiplied by a product of correction 
factors specified as follows. Hach correction factor corresponds to one of a number of 
disjoint growps of indices of the rightmost bracket, where the totality of the growps exhausts 
the rightmost bracket, and is of a form arising from either balanced nested, crossed, or 
simple fractionated samples. : 

Alternatively, consider the situation where : 

(3) The term involving окр) т the o? form of the expansion of Ба? is 


z О 1 
ШУ par П (range of 1) ^ Nawr-o ` 
Af. 


Inspection reveals that (1) and (2) imply (3). The converse, however, does 
not necessarily hold. The argument on page 129 now shows that whenever for each 
admissible a2, the corresponding term is given by (3) then 


x 
уо Мес bo ee а Н 
en Noa | 
of some admissible sample partial mean by Sj and by 


Denote the subscripts 
— the number of possibly different components of type Sz 


хун), s; Si = Ns, 6 
entering into the formation of ge Then it follows that 


Sie 
= х [2 |, qed. 
Pi PAST: | 


With regard to tables of the sample analysis of variance we DEC that their 
subdivision into individual lines is completely determined by the internal strueture 
imposed on the sample set of indices. Each admissible sample mean becomes эм 
forming or leading term of опе of the lines of the table, and with balance the total 


sum of squares decomposes additively into the sum of the sums of squares belonging. 


to the individual lines. 
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In what follows an attempt is made to derive sufficient conditions for the 
expected values of the mean squares of the lines of analyses of variance tables to be 
"simple", in terms of the X's. The conditions are similar to the ones needed with 
cases where cross and nested samples only are involved. However, because with 
randomization the correspondence between sample and population subscript is no 
longer one-one throughout, the additional concept of ambivalence, attempting to 
emphasize the possible relationships, has been introduced. It is believed that as more 
understanding of the situations is gained, a simpler descriptive scheme will be formu- 
lated. Work on such a scheme is now in progress. 


Definition: A sample subscript $* and a population subscript j are said to 
be in the relation of complete ambivalence if summation in the observed sample values 
over the range of i* always implies that the size of the induced sample in every one of 
the population components involving j is multiplied by the sample range of i*. 


As a concrete example consider again the case of the randomized block design. 


The population decomposition of Ур into population components and the 
statistical model for z;,;,, are as on pages 135 and 136. It is easy to see that any sum- 
mation, in addition to previous summations over i* and / or f, of the 2,,,/8 over tho 
range ¢ of the sample index Ё* causes the induced sample size of every population com- 
ponent involving the subscript Æ to be # times as large. The same happens for every 
population component involving the subscript j. Thus, in a certain sense, tho 
sample subscript Ё* corresponds completely to the population subscripts j and k. 
We say that k* is in complete ambivalence relation with j and Ё, or equivalently 
with the corresponding population entitiés P and T. 


Definition: The full sets of sample and population subscripts are said to be 
in the relation of complete ambivalence if whenever summation over the range of 
any one of the sample indices is effected, then independently of the state of the other 
indices the size of the induced sample in every one of the types of population compo- 


nents either remains unchanged or is multiplied by the range of the sample index 
in question. 


Thus, referring again to page 136 we verify that always summation in the 
sample observations over the range of k* induces the multiplicative factor ¢ in every 
population component involving Ё or j and induces no change in the sample size of 
any of the other population components. Similarly, summation over the sample range, 
и, d the index f involves multiplication by r of the induced sample size of every popu- 
lation component having j for а subscript, and involves no change in the induced 
sample size of the other population components, Finally, summations in the observed 
values over the sample range of i* induces in a similar way a change in all population 
components having i or j for subscripts and induces no change in any of the other com- 
ponents. Thus, in this example the full sets of sample and population subscripts 
are said to be in the relation of complete ambivalence. 
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We mention in passing that in the case of the latin square the reasonable 
full set of sample subscripts is not in complete ambivalence relation with the full set 
of population subscripts. Hence, the theorem below on the expected values of mean 
squares cannot be applied as it stands to the latin square case, However, the expected 
values of squares of admissible sample means still admit the standard > expansion. 
Hence, expressions for the expected mean squares can be easily obtained for the latin 
square by direct manipulation. In like manner, if one does not wish to make use of 
the notion of complete ambivalence in the concrete examples encountered in practice, 
one can still obtain easily all of the expected mean squares by computing simple 
linear combinations of the relevant X forms. 


Consider the set of experiments for which the expected value of the square 
of the overall sample mean admits a standard X expansion. Suppose further that the 
set of sample subscripts used is in complete ambivalence relation with the set of 
population subscripts. Let 25; be the leading term of a particular line of the sample 
analysis of variance table and Yj be the set of rightmost bracket subscripts of $1. 
Let S, denote the веб of subscripts belonging to a typical population component. 


Theorem 5: The expected value of the mean square whose leading term is 
is X P(S,) Ув»; where P(Ss) has values as follows : P(S,) = 0 if and only if at 
Ss de 


s 
261 


least оте of the subscripts of Y; is mot in complete ambivalence relation with any of the 
subscripts of Sy; and otherwise Р(83) = number of times a value of typical component of 
S, enters into the formation of the complete observed sample. 


Proof: Let S, be the set of subscripts defining any one particular type of 
component. ‘Then, by the argument of the theorem on expected mean squares of 
Section 3 the coefficient of Xs, in the expected value of the mean square is zero if at 
least dne of the subscripts of Yj is not in ambivalence relation with any р the sub- 
scripts of S,. Again, if every subscript of Y is in ambivalence relation with at least 
one of the subscripts of Sa then the coefficient of Xs, in the expected value of the 


B "o: 
sum of squares corresponding to 8; is 


TEE 


itest 


Jt s St, 


Je 1 zl x (D.F. of S1). 


n 
itcSt- 8t 
8*—8]›82 
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Hence the coefficient of Xs, in the expected value of the mean square corresponding 
to æg» is equal to the product of sample ranges of all indices in 8*— S; which аге not 


in ambivalence relation with any of the subscripts of S,. However, because of the 
way in which admissible means are constructed every subscript of Sj is in ambivalence 
relation with at least one of the subscripts of Sẹ. Hence, the required number is equal 
to the product of sample ranges of indices of 8* which are not in ambivalence relation 
with any of the subscripts of 5,; and so is equal to the number of times a typical selected 
value of the component of types S, enters into the formation of the complete observed 
sample. q.e.d. 

Corollary (i): Whenever Xs, appears with non-zero coefficient in the expected 
value of а mean square, then во does Xs, for every 8, > Ss. 

Corollary (ii): The expected value of a mean square with rightmost bracket of 
the leading term Yj is equal to the common part (i.e. intersection) of the expected mean 
squares of any two lines for which the respective sets of rightmost bracket subscripts Uj and 
Wi are such that 

Ui+Wi = Yi. 

Tt should be noted that whenever a sample subscript is in ambivalence relation 
with a given population subscript, say K, then it is also in ambivalence relation with 
every subscript nested in K. For this reason we shall indicate explicitly the relation- 
ship with К only. - Further, if the index К corresponds to the population entity, say 
Т, then we shall also say that the sample index is in ambivalence relation with 7’. 

In the case of a fixed set of treatment effects, the following simple result is 
of value. 


Let each treatment enter into the experiment K times. Denote by Т* the 
mean square due to treatments, and by оў the treatment component of variation. 
Then the average variance of the natural unbiassed estimates of the treatment diffor- 
ences is given by 


ү G(T*)—K of), 


1% should be noted that the theorem of the present section on expected mean 
squares contains the better known one involving balanced crossed and nested samples 
only as a special simple particular case which is also simple to state. In the appendix 
we exemplify with three analyses of variance, two involving the relations of crossing 
and nesting only and the last involving also the relation of simple fractionation in a 
situation in which the treatment levels are subject to error, A number of other 
ж. tables may be found (Wilk, 1955), (Zyskind, 1958), (Zyskind and Kempthorne, 
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ON STRUCTURE,- RELATION, У, AND EXPECTATION OF MEAN SQUARES 


Appendix 
SOME EXAMPLES 
Example 1: ; Envisngo an investigation of the fertility structure with respect to some uniform 
treatment of a specified agricultural land area. Suppose the initially postulated structure is 
(SR) (0:D) ). ; 


Suppose that the sampling operation consists of randomly selecting s sources out of 5, in each 
seleeted source choosing randomly т rows out of the Ё and c columns out of the C, and then within 
each chosen column choosing 1 columns out of Г. We exhibit the complete analysis of variance in 
Table 1. 


Example 2: Four entities are involved in the structure. Denote them by P, Q, S, and В. The 
set of relations is: Ф is nested in S, and В is nested in SP combinations. Symbolically : 
(8:0) (P) апа (SP:R). 
Using these letters for subscripts we see, that the population identity for the typical response is 


Y, = Y4(¥g—Y¥)+(¥p—¥)+(¥sp—Ys—Yp+¥) 


SPOR 
+(¥s(ay—Ys)+(¥scar)—Y¥sa—Ysp+ Ys) 
+ (¥spay—Ygp)+(Yspcor)—Ysra—Ysprt Ysp). 


Denote the respective population ranges of the subscripts by S, Р, Q and В. Then, using the 
obvious notational correspondence we verify that the Xs are as given on page 122. 


Denote the sample ranges 


Envisage a sampling investigation condueted on the above structure. 
We exhibit the 


by small letters corresponding to the capital letters used for the population ranges. 
resulting analysis of variance in Table 2. 


Example 3: We now consider a specialized procedure in an experiment in generalized randomized 
blocks when the application of intended amounts of treatments may be subject to error, The errors 
in the treatments are considered as treatment sublevels nested within treatment levels. A more detailed 


treatment of the following may be found in Zyskind and Kempthorne (1960). 


The population structure is envisaged to be as follows. The experimental material has a nested 
structure : that of the blocks, B, nesting in them the ultimate experimental units, P (for plots). Again, 
the main levels of the treatment entity, б, nest within them the treatment sublevels, g. The population 


structure may, therefore, be symbolically represented by 
(B:P) (8:9). 


k,and m. Also, except 


ective indices ranging over the elements of these entities by i, j, 
thus 


‘We denote the resp т 
for m, we denote the ranges of these indices by the capital letters denoting the entities themselves; 
i=l, 2, .., В; 1=1 8, «+ Р; k=1, 2, .., б; also m-1,2,..,M. An alternative representation of the 
population structure therefore is 
(6j) (k:m). 

The experimental procedure consists of choosing firstly, b blocks at random out of B and g treat- 
ment levels (or treatments) out of G. Further, suppose that the b chosen blocks are arranged at random 
into w groups of q blocks each (wq=b). А treatment level is applied simultaneously to all the rq plots 
within а group of blocks which receive it. Hence, treatment sublevels are randomly confounded with 


- blocks since the blocks are arranged into groups at random. 
We denote by zjemer*f the observed sample value in the m*-th chosen block in the i*-th 
th chosen treatment, The sets of random variables we 


selected group, of the f-th repetition of the Ё* 


introduee are: 
iem, S ke, в дете 
(5; » GA » Жан ), (Р веј ) 


143 


SANKHYA : THE. INDIAN JOURNAL OF STATISTICS: Series А 


The meaning and elementary properties of these variables should be apparent from the content. The 
statistical model is 


ieme e eie i*m*k*f 
- qtue Y, 
Жөн = = В, Ж; Pa Promo ijkm 


i i*m* . pi" с ва 
НОВ" + 2 98 Get 2 B ™ 6, (ВО 


i*m* | i*m*k*f L itm * етеде PG) 
FREU ыу ut EB” бу Рту Pash 


е к; e Um* dei 
B Bg) 
+E dus mem BUT gue (Bom 


, pitm* pke бетерер keis З 
Tox Pt 9 Р ете ГЩ (Pg)ueGm- 


Using the properties of the random variables or by inspecting the types of samples induced it can be 
seen that 


ANEN ER 1 1 i 
Зи Tay Ру 2007 Ьу 2080) ig iy 


ls Bu 1 

* eq Taea ug oo 1 g ox 
DOUNA 

тюу (BOP 


Further, since the ambivalence relation between the set of sample and population subscripts is 
complete, all of the forms for expected values of mean squares exhibited in Table 3 follow as a direct 
consequence of the Theorem 5. Alternatively, Table 3 may be easily obtained by direct manipulation 
of expected values of squares of the appropriate sample means. In Table 3 3’ stands for the sum of 
all Z's whose sets of subscripts involve P. We note that with a notation which ignored the fact that 
groups of blocks are treated alike with respeet to the sublevels, the complete forms of the ensuing analysis 
of variance table would no longer be "simple". 


The more frequently treated case in which there are no treatment errors may be easily obtained 


from the present one by deleting from consideration all components involving treatment sublevels, i.e. 
all components involving the subscript g. _ i А 
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ON STRUCTURE, RELATION, X, AND EXPECTATION OF MEAN SQUARES 


TABLE 1. THE SAMPLE ANALYSIS OF VARIANCE FOR THE STRUCTURE (S:(R)(C:L)) 
м—Ш mmm 
source of variation d.f. e.m.g. 
and leading mean 
mean Kig) 1 tres X pt rg кет l 20 T 208) 
Troyon T Эзусун 
81. 
= Иво) + Ire (1-5) 5 pato (1-8) ољ 
с f с 62 
+ (1-е) ву + (-z) (2-6) о 
1). PAY UE ee 
вон (2) oe 
sources X(s) 8—1 lre 25) + le Zor + Zsa) +1. 2 (А0) +r 2 sonny 
+ Э(зуоувг) 
Sof [4 2 
miro oe hall (1-4) imt r (1-2) о 
1 
+ ( 1-x) ( 1-2) Pomot" (1-4) yon) 
r l ы 
+ ( 1 -&) ( -t) "(SXOXRI) 
rows within s(r—1) lc ®(вук) + Zeno ^ 2 soy RD) 
sources X 
(SXR) 
а ELA od. ( 1-1) o 
= 10 о(зуву t! (1—0) %syRe) \ L/ Gxomn 
columns within а(0—1) lr Z soy T! igno) ^^ Zoon Т Peon 
sources X 
(SX CY р 
2 
=e o+! ( 1-5) оно + ( 1-2) СО 
+ ( 1 -®) (1-2) "(Sy Cy) 
Xe within s(r—160—13) 1 Зо + Poor 
sources X SRO) 
1 
ПЕ, гово ( i-i) S(SXOXRD) 
strips within cs(1—1) 5 хвор Z SORD) 
columns within 
sources Х SoD) 
Ж fount (1 ж) эск» 
—1)(1—1) 
(rows x strips) sc(r—1)(1 adt 
within columns within (SK O(RD) 'SXOXRD) 
sources X SORI) i 
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TABLE 2. ANALYSIS OF VARIANCE FOR THE STRUCTURE (5:0)(Р) AND (SP:R) 


leading mean d.f. e.m.s. 
Xo 1 pars Zit PI Botas Bp ta Sep +P" о) 
"уро tT 2gpym=t (вру) 
ua 2 
= pars oto --par (-? ) eism (1-Е) ЧР) 
а Э |2 9 р 9 | 23 
+ (1-5) (1-6) ьн (1-0) Фо" (1-5) ( -$) %s\PQ) 
0 ES — 1 |? 
(1 x ) рю (: $) (1 HL 
x = 
© x э Bg n Е gp) НР" Zio) +” зуроу 9 esp (рхо) 
b p а) 2 
= par 95у ( -&) tsp (1-5) о 
е e 209 2 Е T 
+(! Р ) ( &) syra ta (1 Е) "(spy (1-&) (:-; ) Фрун) 
х 2i 
P) z 18 2 уе (gp +P уро) Ч руву зрур) 
= 8 
= ани (1-2) er (1-2) to 
= |02, 9 жуса 
+a(1 x) + (1-2) (1-4) V(SPXQR) 
x 's—1)(p—1)* 
(SP) (MPN! Bop +? 2 урау Zispym Sispyo) 
5 9 т 
nid d ( iud ) оч К -&) spy) 
q r 
+ (1-6) i78) вною 
Хо) 5(4—1) туду" уро Zispyon =P" 9s (1-5) бу Ро) 
Le 
Rr ( -®) "(spXQn) 
Хоро  #@—10Ф—1) "уро, +-55руову =" урф) + ( Е ) "(spyQR) 
Хар) P0—D 9 Zispym) + Хрон) = 4 (зрур) + ( 1-$ ) "(SPXQR) 
Xuspyom #Р(Ф—Щт—1) Z«spyom = (арув) 
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TABLE 3.* ANALYSIS OF VARIANCE FOR A RANDOMIZED BLOCK DESIGN IN WHICH 
ч BLOCKS RECEIVE THE SAME ERROR FOR EACH TREATMENT LEVEL AND 
w SUCH GROUPS OF BLOCKS ARE CHOSEN ALTOGETHER 


Population structure and experimental procedure (B : H di: 9); sample structure 
| | 


| 


( (i*:m*) (k*):f); ambivalence relations i* — B, g; ™* + B; k*—> G, P; f> P 


source of variation e.m.s. 
and leading mean 


between groups 79 5 TtZ > , 
of blaska ga ОШ 12a +" Жу” суву" Xe) 02 
v 


2 T 2 
Tn ( 1-8 ) Mayan) "t ох ( 1-5) (Bey Po) 


ithi d 
within groups rgd Tf Z Bo +” > TE 


of blocks g. (в) 


i*(m*) 
=nom +r ( 1-2 ) ES ( 1-9 ) оуу 


(6) (Bo) 


3 1 2 
C9) С) deni tam 


ы 2 
ал ( 1-5) (BoyPo) 


> xy 
between treatments 2, 7b 20) + (во) +" Xy) +179 Eoo $ 


=rb olg +1 ( 1-5 ) обв) + вур) 


r 2 
оу о) Ame ear а 


treatmentsxgroups r Хве) +" Хоув +74 2000 + > 
of blocks Pepe 
v 


1 T 
=" обв) + (вур) + ( cer ) -5) ciaeo 
T 2 
+( 1-5 ) (ву +79 Maya) + ( Van: ) (Ba) Pa) 


р и х 
treatments xblocks r £ +r (сув) * 


LIS (BG) 
within groups 2 (тее) 


1 1 
=r opat UP) ( (1-5 ) - 7) ғо + "(двд 


, 


1 
= ойр) (1-5 ) овур) +©(ввуру) 


error 2 
i*m*k*(f) 


sub levels, M, within в treatment 


tential 
the number of potent the present table. 


expressions for e.m.s.’s of 
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level, is taken to be infinite in all the о? 
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USE OF DISCRIMINANT AND АШЕР FUNCTIONS IN 
MULTIVARIATE ANALYSIS 


By С. RADHAKRISHNA RAO 
Indiam Statistical Institute. ч 
SUMMARY. The use of discriminant function is justified only when it is known that the 
individual to be classified b: longs to a certain subset of alternative groups. In this paper statistical 
methods are developed for examining whether a given individual can be supposed to have arisen from one 
of the populations in the subset and then using the discriminant function. 
The use of certain other linear functions in problems of classification of groups, prediction, etc. 
has also been discussed. 
1. INTRODUCTION 
That the likelihood ratio is the best criterion for discrimination between two 
alternative hypotheses is justified by the fact that it is a sufficient statistic when 
only the two alternatives are considered (Smith, 1947), But, in practice, when it is 
not certain that an observed specimen belongs to one or the other of two given groups, 
there is need to consider the possibility of its belonging to some unknown group. The 
likelihood ratio which is equivalent to the discriminant function may not be sufficient 
when other alternatives are considered. Elsewhere (Rao, 1960), I have considered 
an example which gave rise to some controversy because such a possibility was ignored 
and a diseriminant funotion was used to decide between two given alternatives 
(Bronowski and Long, 1951; Yates and Healy, 1951). 
Tt is, therefore, important first to examine whether the discriminant function 
n alternatives is adequate for drawing inference 


constructed on the basis of two give) 
on an observed specimen. Only when this is confirmed by the observations on the 


specimen can we actually use the discriminant function for coming to a decision. The 
paper is devoted to development of suitable tests for this purpose. 
d their measurement based on multiple 


Concepts of size and shape an 
variables are also considered. These functions have been found useful in the study 
of discrimination between groups. 


of inter-relationships and in the problem 
IENOY OF THE DISCRIMINANT FUNCTION OVER A WIDER 


2. SUEFIC 
SET OF ALTERNATIVES : 
ions with parameters 


Let us consider two multivariate normal distribut: 
(из, A) and (из, A) where p, Ma are vectors of mean values and A, the common dis- 
persion matrix. The log likelihood ratio, which is the linear discriminant function 


based on these two alternatives, is proportional to. 
8 Aa, (2.1) 
where $ = и. — м» A, the reciprocal of A and d, the vector of observations. Lemma 
1, which may be considered as à generalization of Smith's result, shows that the dispel 
minant function (2.1) is not only sufficient for the alternatives from which it s 
derived but also for a wider set. y 
Lemma 1: The discriminant function $ A- æ is sufficient for the set of alter: 
natives (GJ, - pta, А) where ajta = 1, 16, ue probability distribution of æ given 
8’ А-1 æ is the same for ай expected values of æ lying on the line joining s and из. 
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To prove the lemma it is enough to show that the ratio 

P(x |a, +a) + P( A71 o ay; 3-315) ew (2.2) 

of normal densities is independent of (ау, аз) provided a,--a, = 1. The logarithm 
of (2.2) is proportional to : 

(@—a,8—p3) Арабы) А ВЫ (as) 
where аз has been replaced by (1—а,). The coefficients of a? and a, in (2.3) are obvi- 
ously zero establishing the required result. 

Lemma 2 which gives an extension of the result of Lemma 1 to the case of Ё 
multivariate normal populations with parameters (ti Л), i = 1, ..., b, can be proved 
on the same lines. Let L, ..., L; , be (k—1) independent likelihood ratios or (#—1) 
independent discriminant functions arising out of some (k— 1) pairs of populations. 

Lemma 2: L,.., L, у are sufficient for the parameter set Ay fly Наш, 
where а, ++. нау = 1. 


3. TEST PRIOR TO THE APPLICATION OF DISCRIMINANT FUNCTION 


As observed earlier, in any practical problem it is of some importance to con- 
sider the possibility of an observed specimen belonging to a group other than any one 
of those specified. For this purpose, it appears more natural first to test the hypothesis 
that the discriminant function is sufficient for drawing an inference on-the observed 
specimen, Only if there is no evidence against this hypothesis could we use the 
discriminant function for further study, otherwise the question of classification of the 
observed specimen as a member of one of the two given groups does not arise. Again, 
we do not use the discriminant function just to decide between the two given alter- 
natives only but also allow yet another possibility of the observed specimen belonging 
to a third group with its mean values lying on the line joining the mean values of the 
two given groups. 

We shall first consider the case where all the parameters are known and the 
result of Lemma 1 regarding the sufficiency of the discriminant function is strictly valid. 

The probability density of the observation æ for an arbitrary mean д is 

Р(а | и) = const. exp {—+4(#—и)'А-Чж— p). i (3.1) 

The criterion for testing the hypothesis that № = аша where p, and 
Аз are specified, is provided by the likelihood ratio, 

a apo Sete aa) t 
2-3 log, sp Pos) 22.48.) 
< в 


= inf (@—ду—а,бу' A+ (z— p,—a,8) 
1 


= (2—63) A @—рь)—ЇЎ A (Bar (8:3) 
= (2— ш) А (p) ÈA ea edis (by symmetry) ... (3.4) 
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DISCRIMINANT AND ALLIED FUNCTIONS IN MULTIVARTATE ANALYSIS 
The statistic (3.3) is the difference between 


X = (вр) A3 (x— u) ... (3.5) 


which is a x? on p d.f. and provides a test of the hypothesis that the individual belongs 
to the second population when nothing is known about the alternatives, and 


—1ё' A (ш—н„)}? 

ADT аат .. (3.6) 
which is а x? on 1 d.f. and is useful for examining whether the individual belongs to 
the second population given that the alternatives for mean values are points on the 
line joining ш; and м». We may then write the required test criterion (3.3) as 

= (д1) 
which is a x? оп (р—1) df. 1f y? is significantly large, there is evidence of the observed 
specimen belonging to а third group. On the other hand, if the value of y? is not large, 
there is some assurance that the use of the discriminant function will not be mislead- 
ing. The closeness of the observed specimen to one or the other of two groups is then 
judged by the discriminant function or same as chi-squares 


/ A 2 PATET ° 
[8 A ысы үе ы 
each with 1 d.f. As observed earlier, both these chi-squares may be large in a parti- 
cular situation, indieating another possibility of the observed specimen belonging to 
a third group. 

For example, in the classification of the Highdown Skull (Rao, 1952, pp. 294- 
296), considering the Bronze Age population we find y? = 12.694 and yj = 3.993 
giving a difference, y3 = 8.701 on 5 d.f. Considering the alternative of Maiden Castle 
population, y? = 9.091 and 42 = .390 giving the same difference, № = 8.701 on 
5 d.f. which is not significantly high. We may then use yj values on I d.f. for arriving 
at a decision. The X? for Bronze Age is significant at the 5% level and that for 
Maiden Castle is small, which suggests a classification of the Highdown Skull as a 
member of the latter population. 

Bronowski and Long (1952) amended the procedure, suggested by gus in 
the earlier paper (1951), of using the discriminant function without a preliminary 
test of its sufficiency, to admit the possibility of an observed specimen belonging to 
a third group. Their new approach, however, does not s use of the discriminant 
function at all. e 

4. TESTS WHEN THE NUMBER OF ALTERNATIVES IS MORE THAN TWO 


As before we consider the case where the alternative distributions are com- 


i 1 iew of the result 
letel i i.e., the parameters involved are all known. In view o ; 
We О : ({—1) independent likelihood ratios 


of Lemma 2, it may be examined first, whether 


or discriminant functions arising out of k specified distributions are sufficient or not. 


Or in other words, whether the observed specimen can be considered to have arisen 
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from a population whose mean is on the hyper plane determined by the means of the 
k specified populations. If, as before, æ stands for the vector of observations, 


Hy, ...› Шу for mean values and A, the common dispersion matrix, the test criterion is 
= min (y— Ха), A 3(x— Уай) Eu (4.1) 
Ха=1 


which is а x? on (p—k+1) d.f. А method of computing a statistic of the form (4.1) 
is discussed in an earlier paper (Rao, 1959). If the value of x2 in (4.1) is not signi- 
ficantly high we consider only (0—1) discriminant functions instead of p variables. 
For each group we then have a x? on (k—1) d.f. to judge the affinities of the observed 
specimen with that particular group. 
5. TESTS WHEN THE COMMON DISPERSION MATRIX IS ESTIMATED 
In case A is estimated by S, which has Wishart’s distribution on n d.f. the 
statistic 
ae Dae EU (5л) 
n(p—k--1) 
which is constant times (4.1) with A+ replaced by S-, has F distribution on (p—k-+-1) 
and (n—p--k) d.f. In the case k = 2, the F statistic is 


ПРЕ ПР аа, [0S Кем) (5.2 
арзу аре а-а) yu) s (з) 


with p—1 and n—p-+2 d.f. These results follow from a general distribution problem 
solved in an earlier paper (Rao, 1959). 


There is the further problem of testing the hypothesis that the specimen 
belongs to a particular group, using the estimated discriminant function or functions 
in the case of more than two alternatives. For k = 2, the test criterion for the hypo- 
thesis that the specimen belongs to (say) the second group is 

(0 n—pAl р [8'S@—p,)P . ba 

L| wagon) ... (5.3) 

which is F on 1 and (n—p--1) df. А similar statistic with an F distribution on 
(k— 1) and (n—k+3) d.f. can be constructed in the case of k alternatives. 


No exact tests of the types (5.2) and (5.3) seem to be available when the mean 
values of the two alternative populations are themselves estimated. An approximate 
test of the type (5.2) is provided by the smallest root of the determinantal equation 
defined by Fisher (1939), Тһе further problem of using an estimated discriminant 
function as in (5.3) is not. satisfactorily solved. 


6. SIZE AND SHAPE FUNCTIONS 


Penrose (1947) defined certain linear functions of measurements as representing 
the size and shape of an organism. More general functions were constructed by the 
author (Rao, 1958) to examine differences in size and shape between groups of 
organisms. These concepts are further generalised in this section. 
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Let us consider a linear function 


X = aq, yey py apa tpn Eom ... (6.1) 
of (k-+m) variables 24, ..., Ж, with unit variance, i.e., 
м аад, =1 ей 96:2) 


where A; is the covariance between 2; and 2;. 

We wish to choose the coefficients аң, ..., 03; in such а way that a given 
increase in X produces on the average maximum changes, in given directions and 
given ratios, in a subset of the variables (say) a, ...,%,. If the regression of v; on 
X is taken to be linear then the average change in z; for a unit change in X is exactly 
the regression coefficient of 2; on X. Let the ratios of specified changes in a, ..., v, 


including signs be p; : рз :... : р. Using a multiplier а, we write the equations 
Xa; = % pj, ЕО | 
У (6.3) 
Хай, = 4 Yj j= Lbs] 


where уз, -..; Ура are arbitrary constants. The algebraic problem is that of maxi- 

mising æ with the restriction (6.2) on а; by suitably choosing у;. Let A^ be partitioned 
A = (А, : As) 

such that A, is of order (kx%), Л» of order (Ex $—E) and р and ү are the column 

vectors of elements р; and у; respectively. The solution for a, the column vector of 


а; is 


а= (Ау: ^(f) = a(Aye-+Azy)- ... (64) 

Using (6.2) o? = 1--(Аур-ЕАЛ,ү)'А(А^ур-ЕЛ„»ү). ... (6.5) 
а is a maximum when the denominator in (6.5) is a minimum, i.e., y is chosen to satisfy 
AgAAgy = ААЛ. © (6.6) 


Having chosen ү, we determine « from (6.5) and then a from (6.4) to obtain the desired 


linear function. 
An alternative way of specifying the linear function 
equations ensuring a given ratio of the regression coefficients of x; 


X aqy =@р, J= 1d 


The residual variance of 2; given 2 is 
нора’ Aa. (6.8) 
We may choose а, æ, subject to restrictions (6.7) to minimise the sum of the residual 


variance for the s variables £y, ..., 2; 


$ (а/а ^a) 
1 


(6.1) is as follows. The 
on X, i=l, ..., 8 are 
(6.7) 


or maximise a?/a/Aa. The equations leading to the determination of a are same as 
(6.4) and (6.6), thus establishing the equivalence of the two problems. 

Tf p; are all positive we may call (6.1) a size funetion with respect to d cha- 
racters a, ..., ж, in the sense that an increase in the value of Be results in an ш) 
on the average, in each of the characters 2, --.,%- Ву choosing some of the p; to be 


positive and others negative we obtain a function with the property that an increase 
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in its value increases the value of some characters and decreases the value of some 
others, Such a function may be called a shape function. 

It may be noted that in the construction of a size or a shape function for 
21, ..., 2, we have used other measurements z, ү, ..., 2, also, and further that the linear 
funetion (6.1) constructed by the method indicated is invariant for linear transforma- 
tions of the extra variables 2,41, ..., 21. 

The present approach admits the possibility of constructing a linear function 
of any set of anthropometric measurements to represent shape of the head and to 
discriminate between long and broad headed people, instead of using the cephalic 
index. Such a linear function was used in an earlier paper (Rao, 1956) to study 
differences in head shape of castes and tribes belonging to different States in India. 

A closely related problem is that of determining a linear function X of k 
variables such that the sum of the residual variances of z,,..., =, given X is a 
minimum, without placing any restrictions on the regression coefficients. Since the 
sum of residual variances is 


x {Au Prey } 


tnt 


the problem is same as that of maximising 


+ (ХА, а; _ а’Ва 
CNET OM 
where B = A'A, A being the (sx k) matrix obtained by taking the first в rows of the 
matrix A. The vector а maximising the ratio (6.9) is then the latent vector corres- 
ponding to the maximum latent root of the determinantal equation 
|B—AA|=0. ` ... (6.10) 
In fact, by considering the first, second, third, ... latent vectors we obtain a series of 
linear functions Ху, Х,, X}, ... such that Xj, ..., X; (for any i) is an optimum set of 
i linear functions which can best predict the subset of variables %,...,%,. The determi- 
nation of such functions is likely to be useful in specifying the sizes of ready-made 
garments, shoes, etc. 
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EQUALLY CORRELATED RANDOM VARIABLES* 


By SIMEON M. BERMAN 
Columbia University 
SUMMARY. A representation theorem for a second order stationary sequence with constant 
covariance is given, and is applied to finding the limiting distribution of the maximum term in a 
sequence of equally correlated Gaussian random variable. 
l. INTRODUCTION 
A sequence (X, : n > 1} of random variables on a probability space (9, ,/%, Р) is said 
to be equally correlated if ; 
EX,=0, ЕХй=1 (}>1) y o (b) 
EX,Xm =p (т # n) 11:2) 
where 0 <p < 1. The number p is restricted to the indicated range for if p = 1 
Х,=Х,=..=Х„=.. : 
with probability 1; the ease р < 0 does not arise since the determinant of the covariance 
matrix of the first n variables, y 


l p p 

pi 
1р 

p P p 


must be non-negative, i.e. 
[np+(1—p)]—py"* > 0; 
hence p>- E 
for every п, so that р is non-negative. If p = 0, the random variables (X,) are orthogonal. 
If (X,) is a sequence of exchangeable random variables, i.e. a sequence whose finite 
dimensional distributions are invariant under permutations of the indices, and if the first 
two moments exist and satisfy (1.1), then (1.2) holds, so that the {Xn} are equally correlated. 


Exchangeable random variables have a well-known representation as “conditionally inde- 


pendent" random variables (Loeve, 1955). In this paper à representation for equally cor- 
ed to get an explicit representation of 


related random variables will be derived and will be us 
exchangeable Gaussian variables. 
2. THE REPRESENTATION THEOREM is 
Theorem: Let {Xn : n > 1) bea sequence of random cordon dea Then 
there exist a random variable Y and a sequence of random variables {Un} вис 


„= UY (n> l) n 
where EU, = EY —0; EU,Y =9 : 
EU} = 1—р; ЕҮ? = р; EU Um = 0 (n + т). 


i i _.. are Gaussian and independent. 
г E ane а sense stationary process, it follows from the 


Proofi: вао ае Рата i i i 4 moment such that 
ergodic theorem that there exists в random variable Y with a finite secon Е 
(2. 


PEA ж toy 
Lim. X 


undation, while the author was ап 


BENE а y E N E 
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The sequence {U,,} is now defined as 
U,— X,—Y (>l; - (2.4) 
Un is well-defined since X, and Y are finite with probability one. The properties in (2.2) 
are immediate consequences of (2.3) and (2.4) and of the fact that mean square convergence 
implies the convergence of the first two moments; (2.1) follows from (2.4). A 
If {Xn} is Gaussian, it follows from the proof given above that У, U,, U,, ... are jointly 


Gaussian; their independence then follows from the orthogonality relations in (2.2). Ea 


This theorem provides us with the de Finetti representation of the probability 
measure on a sequence of exchangeable Gaussian variables with equal correlation p > 0. 
The measure is of the form 


P() = ў РФР) ew (3) 
БС 


where for each у, P,(:) is the probability measure on a sequence of independent Gaussian 
random variables with mean у and variance 1—p and where Ф(и) is the standard normal 
integral. 

3. APPLICATION TO THE DISTRIBUTION OF THE MAXIMUM 

It was indicated in the first section that the correlation p between any pair of п 
equally correlated random variables must be greater than —(n—1)-1, so that n such random 
variables are part of an infinite sequence of equally correlated random variables only if p > 0. 
The distribution of the maximum in a set of n equally correlated Gaussian variables with 
p > —(n—1)* was computed from the inversion formula for the multivariate characteristic 
function by Kudo, (1958). Since the representation (2.5) is valid only for infinite sequences 
it is not applicable to the general problem considered by Kudo (1958); however, if we 
restrict p to nonnegative values (2.5) may be used. Then, 

max(X,, ..., Xn) = max(U,+ Y, ... U,4- Y) = max(U,, ..., U,)-- Y, © (3.1) 
во that the distribution of max (Xi ..., Xn) is simply the convolution of the distribution 
of the maximum of n independent Gaussian variables with the distribution of Y. 

As п—эсо, the limiting distribution of max (X. p ++, Xn) after suitable normalization 
is the distribution of Y, i.e. Gaussian distribution. In fact, it has been shown (Gnedenko, 
1943) that if (U,; n > 1) are independent Gaussian variables with mean zero and variance 
1—p, then 

p lim [max(U,, ..., U,)—V2(1—p) log n ] = 0. 

" AS OO 
Therefore, it follows from (3.1) that the distribution of 
max(X,, ..., X,)— V 2(1—p) log n 
converges to the distribution of Y. On the other hand, if X,, Xo, ... Xp, ... are independent 
Gaussian variables, then the limiting distribution function of (3.1) is 
exp(—e-?). (Cramér, 1946). 
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STUDENTISATION OF TWO-STAGE SAMPLE MEANS FROM 
NORMAL POPULATIONS WITH UNKNOWN VARIANCES* 


I. GENERAL THEORY AND APPLICATION TO THE CONFIDENCE 
ESTIMATION AND TESTING OF THE DIFFERENCE IN 
POPULATION MEANS 


By HAROLD RUBEN** 
Columbia University 


SUMMARY. A two-stage sampling procedure is developed for the estimation of means, and 
functions of means, of unknown normal populations by means of confidence sets of predetermined dimen- 
sions, and, concomitantly, for testing hypotheses relating to these means with power independent of the 
unknown variances. The estimation and test procedures are shown to be conservative and of high effi- 
ciency relative to the corresponding fixed sample size procedures when the variances are known. 


The general theory is then used to obtain confidence intervals of preassigned confidence coefficient 
and length for the difference in means of two unknown normal populations, as well as tests for the equality 
of the means with power independent of the variances. The efficiency of these procedures for the two- 
mean problem is discussed in detail, with special emphasis on the minimisation of the expected total amount 


1. INTRODUCTION 


In a previous paper (Ruben, 1961) Stein’s results (Stein, 1945) relating to the 
studentisation of sample means from normal populations which are assumed to have 
equal though unknown variances by means of two-stage sampling were derived in a 
different manner and in somewhat more extended and generalised form. There it 
was stressed that one of the advantages of the alternative method of approach is that 
it lends itself naturally to generalisation in the direction of the studentisation of sample 


means from normal populations with totally unknown variances. We propose here 


to carry through this programme. 

Specifically, this series of papers investigates the problem of obtaining confi- 
dence sets with predetermined dimensions and confidence coefficient for the means, 
or for certain functions of the means, of a finite number of unconnected and unknown 
normal populations. (See also Ruben, 1950.) Preliminary samples are drawn which 
enable one to form estimates of the unknown variances, and these estimates determine 
a rule when to stop sampling. The sequential estimation and the corresponding 


similar test procedures involve the choice of constants ny; k; which determine, res- 


pectively, the sizes of the preliminary samples and (approximately) the average total 


number of items sampled from the i-th population. These constants are chosen во ав 
ature of the confidence sets or the form 


to meet the specifications laid down for the ni 
of the power functions. 
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A more significant interpretation of the constants or scale factors 1/Ё; is that 
they play the same role in the present scheme of sequential sampling as the estimated 
standard errors of the means play in sampling with a predetermined number of items. 
'This implies that the same scheme of sequential sampling allows one, in effect, to 
studentise the whole block of sample means from normal populations. Furthermore, 
in sampling from unknown normal populations with a predetermined number of items, 
the statistician is, as it were, at the mercy of the sampled values of the estimated 
standard errors of the means from the separate populations. It follows that the geo- 
metrical properties of his confidence sets and the power of his tests, even if available, 
are necessarily subject to change fluctuation in the one case and dependent on the un- 
known variances on the other. On the other hand, in sequential sampling what cor- 
responds to the estimated standard errors of the means, viz. the 1/k;, may be fixed 
in advance. It follows that the geometrical properties of the confidence regions and 
the power of the tests may likewise be fixed in advance before sampling commences, 
independently of the unknown variances. In particular, confidence intervals of pre- 
determined length and confidence coefficient, as well as tests of predetermined power 
may be obtained for the difference in means of two normal populations with unknown 
variances. This is discussed in some detail in the present paper. 


It is shown that if the variance of the i-th population exceeds то;//, then the 
efficiencies of the given sequential estimation procedures, as compared to the corres- 
ponding estimation procedures based on a predetermined number of observations 
when the variances are known, are nearly equal to unity, even when the ng; are fairly 
small, and converge quite rapidly to unity as noo. The question of improving 
on the efficiencies when the variances are small is briefly touched upon in the conclud- 
ing paper. 


\ 


2. DESCRIPTION OF THE SAMPLING SCHEME 


We suppose that there are r(> 1) unconnected normal populations with 
unknown means and unknown variances, д, o} ( = 1, 2, ..., r). Preliminary in- 
dependent random samples of predetermined size ng; are drawn from the populations, 
the results of such sampling being a set of observed values y (i, = 1, 2, .. 75 
j21,2,..,ny) Further observations ®(Ф = 1,2,...,т; J = notl, Ти, eee т) 
are then obtained, where s 


ni = max ((Fisoj, noi) ЕТ E Un) 
and . s = (13 X аф—(У аут] x pd o E (22) 
` kb j=l j=1 


the k; being predetermined real constants with {q} denoting the smallest integer not 
less than g. (The 82; are the pilot sample variance estimates based on то; —1 degrees 
of freedom). We shall refer to the scheme or rule of sampling described by (2.1) and 
(2.2) as (то, ..., nor; №, ..., kp) № will be noted that S amounts to applying a 
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Stein two-stage sampling procedure (Stein, 1945) to each population separately. The 
probability distributions of the n; are therefore given by (cf. Stein, 1945 ; Ruben, 1961) 


p (т, < то), 
p, (9) суў ку (Mihi) (m; = Mi), и (2:3) 


24 "i (mhi) —F, ((m,—1)h) (m; > то), 


where Pm (00) E Pr{n; Wm m| His с, А ev (2.4) 
hy = vilko, ... (2.5) 
эм = fig; — l; (2.6) 


and Е, () denotes the distribution funetion of y? with vp; degrees of freedom. Fur- 


thermore, upper and lower bounds for En; may be set as follows, 


по, toii) Moe holl — Prosa loih) < Bri < n p; Poihi) 
"Evi EL Pros (по + —F voi (noii); aa (2.7) 


whence, in particular, › En ~ k 0%, ... (2.8). - 


the approximation being reasonably good when voilhi = № oF > nois 'The expected 


Ў 
total amount of sampling is, of course, E = X Ёт. 
1 


3. THE EXACT AND LIMITING PROBAPILITY DISTRIBUTIONS OF THE TWO-STAGE 


SAMPLE MEANS AND OF FUNCTIONS OF THE MEANS 


Denote the two-stage sample means by Zi 
ni 
E zi 23194» 


and the vector (Zi f» =° g,) by 2. Similarly, the vectors (Has Ша» o dr) and 


(81,03; «++ Fr) will be denoted, for convenience, һун and о respectively, and the vectors 
dam. Finally, the diagonal rx? matrix with 


(Nis No, т), (gs Mor -> т,) by n an 3 
diagonal elements о? т; (i = 1, 2, .... r) will be denoted by Voi and correspondingly 
the rx r matrix with diagonal elements g;| / m, bY ph. On indicating 


Pr {xeR| L S) 


region Ё in Euclidean r-space, by Р(В| p, о), we bave 


for а given (Borel-measurable) 


Р(В|р, б) = $, E pa, (03) is pug? Pr {жєВ|п = m; p, gi 194) 


159 


ЗАМКНУА : THE INDIAN JOURNAL OF STATISTICS : Sers A 


Again, the distribution of æ conditional on n (say, n = m)is multivariate 
normal with expectation vector м and variance-covariance matrix К: This follows 


from the independence of the vectors (881, ..., 8$) and (8 Фу з P zy) : 

(The fixing of n is equivalent, by (2.1) to a constraint on the pilot sample 
variances which are distributed independently of the pilot sample means.) (3.1) 
then becomes 

© © 1 f 

РЕ, о) = X. X Pp (dP, 0909, G;mV ) .. 02) 


m=l Mp=1 


where Qn (Е; в, vi) = Pr (zcR|n = m ; р, в} 
= (2л) |... [е Еа. 25.7 (8.3) 
к=} ger 


(3.2) and (3.3) give the required exact probability distribution of a. 
We shall now obtain the limiting probability distribution of æ, i.e. 
lim Р(В|м, o), 
9-0 


where it is understood that the о; 00 (i = 1, 2, ..., r) arbitrarily and independently 
ofeach other. For this purpose, let w (i = 1, 2, ..., r) denote r independent y? variates 
with wy degrees of freedom, and note that 


Lege V voile, р 
UM mds (3.4) 


where №; and w; have been defined in (2.5) and (2.6), respectively. Denote further 
the r X г diagonal matrix with diagonal elements V/vy;/k; Ми; by U. Then, analogous- 
ly to the case of equal о? (see Ruben, 1961 ), the right-hand member of (3.2) may be 
identified as a Darboux sum associated with the r-fold integral 
Г ӨР; и, U) аа ... (3.5) 
EL 
and a specific divisions of Euclidean r-space. In (3.5), 
OMR; p, U) = Q*(s p, V will us, -> Уу e V ter) 
= (2л) |... [editt d£, 13121(3:0) 
ut UGER : 

Е} denotes the positive orthant of Euclidean r-space, and @ is the probability distri- 
bution in r-space generated by r independent ү? variates with vp, degrees of freedom 
(i = 1,2, ...,r), so that (3.5) is equivalent to 


© 


T [ ӨЧЕ; и, ls o ll) 


(ii FI) tpud ОЧ: jum SH PRU 
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The division of r-space referred to is obtained by setting off points on the u,-axis, distant 
Moihi, (n9;-1) hi, (то; 2), ... from the origin and constructing through these points 
(r— 1)-flats which are orthogonal to the axis, for i = 1, 2,...,7. This results in gene- 
rating an infinite set of orthotopes such that the probability-mass contained in the 


H PAR - +. 
individual orthotopes is M Po, (ci). (3.5) (or its equivalent (3.7)) then follows from 
i 


(3.2), on using (3.4), replacing mh, by ù; and letting hi 30. We thus have 
P*(R|p)- lim P(Rlp,o = | Q*(H; и, U,) dG. ... (3.8) 
o>% Et 
(3.8) gives the required limiting probability distribution of æ. 
In order to interpret further the probability in (3.8) consider the quantity 


Ри, оо HEX Hy-+ty op] l)e RE), EAE (3.9) 
where tro; (i = 1, ..., т) are independent Student random variables with v; degrees 
of freedom, Introduce 2r independent random auxiliary variables £;, wi, where the 
£; are normal with zero means and unit variances and the w; (as before) are x? variates 
with w degrees of freedom. Then (3.9) is equivalent to 


+ 
T TA 


=a pe (ne, Bee да emi at ... (3.10) 


which is equal to the right-hand member of (3.8). (Recall the definition of Q* in 
(3.6).) Recapitulating, for reference, 


lim. Pr(zeR|p, в} = Р-Н Ip «++» А-В). (3.11) 


o> 
In particular, the limiting distribution function (600 ) of #—p is identical with the 
distribution function of the vector (Фф, .--» во №) 
Finally, we wish to show that any statistical inference (such as confidence 
estimation or test procedure) based on (3.8), i.e. (3.8) is used to compute the appro- 


priate probabilities (e.g. confidence coefficients or risks of errors) as approximations 
provided Q* satisfies certain 


to the true probabilities will be conservative in character, | 
monotonicity properties with respect to t, ..., Ur, ОГ, equivalently, Qn satisfies certain 


monotonicity properties with respect to 03, .-.» бт. Specifically, we show that 


inf P(R|g,c)— lim Р(В|и, в), Эт strictly decreasing 
с с о 


ino; (ф = 12, т), 
(3.12) 


and sup Р(В |, c) = lim Р(Е|и, <), Qn strictly increasing | 
с oon 
шо; i = 1, 2, т). 
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To prove (3.12) it is sufficient merely to verify that the Darboux sum referred 
to previously in equ. (3.2) is an wpper sum, relative to the previous cellular division of 
r-space, provided Qn ; defined in (3.3), is strictly decreasing in each of the c; (ог, equi- 


valently, Q*, defined in (3.6), is strictly increasing in each of the w;), and correspond- 
ingly the Darboux sum is a lower sum provided Qn is strictly increasing in each of the 


ci (or, equivalently, Q* is strictly decreasing in each of the u;). We may then show 
from elementary properties of upper and lower Darboux sums, and analogously to the 
ase where с is one-dimensional (cf. Ruben, 1961), that 


Р(В|и, VOo u УО) < PIR), о, sm). Qu, Strictly decreasing |, 


in the v; (i = 1, 2, ..., r), [ (3:18) 
> P(R|u,o, ...,0;). т strietly inereasing 
in the c; (i = 1, 2, ..., ғ), 
for all positive integral C; > 1. (The increase of о? by the factor C, i.e. the decrease of 
hi by the factor C;, has the effect of superimposing new points of division on the ori- 
ginal division of r-space.) The existence of a value of в for which Р(Ё |, в) is less 


than РЕ, decreasing in the т), ог for which P(R| м, в) exceeds P*(R | и) Rn 


increasing in the о;), then leads to an immediate contradiction. The greatest lower 
bound with respect to с of Р(Ё| ш, с) in the one case, and the least upper bound in the 
other case are thus attained asymptotically. Equ. (3.12) is thereby proved. 


4. ESTIMATION AND TESTING 


Consider the problem of obtaining symmetrical confidence intervals of preas- 
signed width 2a and confidence coefficient 1—2 for the difference in means, = 
of two normal populations with unknown variances 0?, 02, and similarly the problem 
of testing и, — и» with power function independent of с? and оз by means of the two- 
stage sampling scheme S(19,, nos; ky, ky) described in Section 2. The problem of 
testing or estimating и; — и» using samples of fixed size is the much discussed Behrens- 

` Fisher problem. (See e.g. Fisher, 1935, 1941, 1956.) Fisher’s solution (first given in 
essence by Behrens (1929)) is to compound the fiducial distributions of the two separate 
population means. Thus, we may write 
fea ив иа) 
where z, % and s?, s3 are the two sample means and (unbiassed) variances based on 
samples of fixed size n, and т, while 1, —1 and 1n,—1 denote independent Student 
variables with 2,—1 and n4—1 degrees of freedom. We then have 


(a i) (9) 
C) 


Ma cg 


= sin Ó tm —1 — cos Ô tn, 1, ... (4.2) 


К dk appears likely on intuitive grounds that P(R |y. с) is strictly decreasing (inereasing) in the 
cj il OE is strictly decreasing (increasing) in the Si. However, this awaits a formal proof. 
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where tan 9 — “lV ni 
va | (4.9) 


т. For purposes of comparison with the subsequent two-stage sample solution 
it is important to note that in Fisher's reasoning tan 0, the ratio of the estimated 
standard errors of the means, is regarded as constant (as are also the estimated means) 
once a particular pair of samples have been drawn from the two populations. In a 
sense then the field of probability inferences of p—s relate to a subset in which 
the s? as well as the 2; are held constant. Fisher's procedure for testing the null 
hypothesis Hy: #4 = из on significance level æ against the set of alternative hypotheses 
J^; 5 [ty When the с; are unknown is to base oneself on (4.2) and to reject Hy when 


Tg, 
(ad) > dm—1, m—1;0; а, E] 
m m 


where dn, —1, п: 1; o is the weighted difference of two independent Student variables, 
a represented by the right-hand member of (4.2), and dn,—1, 2-150; af is the two- 
sided 100 a9, significance point of dn,—1, т —1; 0: Similarly, an interval for “4—/2 
with fiducial probability 1—a is Р 


2 g 2 
(®\—®—йи—1,м—һю «(1 +) тЫ ОР i 4) je 
It is well known, however, that the legitimacy of Fisher's solution has been seriously 
questioned mainly on the grounds that 0 cannot reasonably be regarded as constant 
after sampling for purposes of inference, and the application of Fisher's rule (4.4) 
when applied repeatedly to two fixed normal populations under identical conditions 
of sampling would not therefore lead to a proportion œ of samples for which Но is re- 
jected, when it is true, independently of the value т\т. (See c.g. Bartlett, 1936, 1937, 
1956.) : І 


Б. CONFIDENCE ESTIMATION 

Reverting now to (по, 15; Kı» Ёз), in order to generate confidence intervals 

of the prescribed type we need to base ourselves on the quantity 

Pr(|(£—25)—05—/2)] < a), (5.1) 

the region .R of Section 3 being defined in the (2; ®„) plane by 

r R :|(@ —®„)—(жу—/»)| <а. (5.2) 

From (3.11), the limiting distribution as ту, 0500 of (2, —15)— (zs — 2) В the distri- 
bution of tyo://,—tros/he- 
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Consequently, 
(21—22) — (4 — a) c Ores. коз: 6, zo (5.3) 
1 1 \+ br 
(ai) 
Lu " 
= — /— ie (5.4) 
where tan 0 E l ly 


and the symbol ~ indicates the equality-in distributions of the two random variables 
on either side of it as тү, т,—эсо. We note that 0 is here strictly fixed and not subject 
to sampling fluctuation (cf. (5.4) with (4.3)). The required confidence intervals 
(9,— £,—a, 2, —2,--а) for /4—/42, of width 2a and with limiting confidence coefficient 
1—a are obtained directly from (5.3), and the four design constants оү, noz, kr, № 
must be related to the two specification constants a and с by 


+ xx 
(atg) dvor, роз; 0; e| = а, ... (5.5) 
1 


but are otherwise arbitrary. The problem of a rational choice of the ig; and k; subject 
to the constraint (5.5) is discussed below. 


It is of some interest to observe that sampling from two normal populations 
with the а? regarded as having fiducial, or conceptual, distributions (distributions 
which have a subjective existence and arise from a thought process in the statistician's 
mind) obtained from an inversion argument applied to the chi-square distribution 
leads to a result which is equivalent to that obtained by sampling from two fixed 
normal populations, provided that the field of probability inference be restricted to 
samples whose sizes are conditioned by the relation (2.1). 


The above estimation procedure is conservative in the sense that the actual 
(unknown) confidence coefficient, which is a function of the с; is greater than 1—a. This 
follows from the first half of equ. (3.12) on noting that the value Qm corresponding 
to (5.2) is 

20[o(7?/m, --o$/m)-3]—1 ... (5.6) 
(where Ф(.) denotes, as usual, the distribution function of a normal variate with zero 


mean and unit variance), and is therefore strictly decreasing in о, and с. The true 
confidence coefficient is 


x u$ „Рт (93), (оао тот)... (вл) 


m = 1 m = 


Consider now the efficiency of the estimation procedure. According to (5.5) 
three degrees of freedom remain for the choice of the m; and k; (i = 1, 2) Even when 
the по; have been chosen there still remains aninfinity of estimation procedures of our 
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type corresponding to the infinity of possible pairs of values for Ё and k, satisfying . 
(5.5). (This is to be contrasted to the sequential estimation of the mean of a single 
normal population (Stein, 1945; Ruben, 1961), where there is only one estimation 
procedure of our type which meets the given specifications, once the pilot sample size 
has been chosen.) A rational choice of the design constants would appear to be that 
corresponding to a minimisation of the expected total number of items sampled,! 
ie. of 


Einen) = È m pa n) È maps rg) ... (58) 


around the suspected ‘true’ values of с; and о’, subject to the constraint (5.5),? but 
we shall here, however, only determine an approximate minimisation in a number of 
specific cases. Observe first that by (5.5) and (5.4), 


ak, = dvo, voz; 6; а] овес 0, Aly = dvor, vos; 6; а]? 866 б, vee (5.9) 


so that on using formula (2.8), 


Еп т) ~ kott kog = ( ше y" (og совой seo 90), 2. (5110) 


the approximation being good when #03 > ny. Now 0 is completely at our disposal, 
since in the present scheme of sampling it is not a random variable but a definite 
constant. Obviously, in any particular case we should like to choose 9 зо as to mini- 
mise the right-hand member of (5.10). Unfortunately, the minimising value of 9 
is clearly a function of à = 03/01, and д being, by supposition, unknown the minimis- 
ing value can never be exactly determined. Nevertheless, it appears reasonable 
to minimise Z(n,--ng) around the suspected value of à, say do. 


It may be objected that the process of choosing a particular value of 4 іп 
advance to minimise, as far as possible, the labour of experimentation or observation 
is illogical since if à were known, e.g. if д were known to be бу, we should certainly not 
use the present procedure at all but rather one similar to that described elsewhere 
(Stein, 1945 ; Ruben, 1961). Indeed, as will be shown presently, this latter procedure 
is considerably more economical in sampling when д = ày However, we do not assume 
that д = ô, strictly but are rather making use of ynat knowledge we have ш à 
may be approximately equal to д. Even if this be quite false me pee ү ү 
of course, thereby invalidated. It simply means that the specifications co ud 
been met with less experimentation than was actually the ре. But it would be 
unreasonable not to use every possible scrap of information one might have, no theo- 


retical grounds or otherwise, about ô. 


ө number of items sampled. 


ion ii rtional to th: 
1 We assume here that the cost function is prope qu ne of work by numerical 


2 One of my students, Mr. H. Stein, has recently gud PORT this № 
methods, and it is hoped that his results will eventually appear in print. 


165 


ЗАМКНУА : THE INDIAN JOURNAL OF STATISTICS: Series A 


We illustrate numerically with two examples, using equs. (5.9) to compute 
ak, and ak, and then the approximate formula (5.10) to compute ZE(m,4-m,). (А 
somewhat more precise, analytical discussion follows below.) 


TABLE l. THE (APPROXIMATE) EXPECTED SAMPLE SIZE 
when рор = 6, Роз = 12, « = 0.05, for varying values of § 


АЕ о 15 30 45 60 75 90 


aki = 71.809 20.052 10.583 7.480 6.296 5.988 
ake 4.748 5.153 6.682 10.583 22.440 87.606 © 
Hostes «ы © 71.814 20.059 10.689 7.740 7.173 со 
when 8 = 
E(ni -- na)a2]o3 oo 72.324 20.720 11.641 9.724 15.062 © 
when § = 0.1 
Жика © 76.962 26.734 21.166 29.920 93.962 © 
when $ = 1 
КЫЫ © 97.574 53.462 03.498 119.680 444.626 oo 
when 8 = 
ао © 123.339 86.872 116.413 231.880 882.956 © 
when à = 
tane co 587.109 688.252 1068.883 2251.480 8772.896 oo 
when = 


TABLE 2. THE (APPROXIMATE) EXPECTED SAMPLE SIZE 
when »91 = 6, роз = 12, a = 0.01, for varying values of § 


7 
фест ны 0 15 30 45 60 15 90 


atk? © 139.3 38.56 21.086 — 15.93 14.18 13.74 
azka 9.333 10.00 12,85 21.086 47.75 197.6 © 
та: оо 139.4 38.69 21.30 16.41 16.16 © 
when § = 0.0 т 
Elm тада oy © 140.3 39.85 23.20 90.70 33.95 © 
when § = 0.1 
Hc nayatet © 149.3 51.41 42.18 63.67 211.8 © 
when à = 
Е (па 4-03)a2/a3 © 189.3 102.8 126.5 354.7 1002 oo 
when à — 5 
Е (та 4-n3)a2/03 © 239.3 167.1 232.0 439.4 1990 оо 
when 8 = 10 
Е (па тз)а? ai © 11 
UM | 39 1324 2130 4791 19780 © 


Both Tables show that a?k? decreases steadily in 0, while a?%2 increases siesaily with 
0. The italicised numbers indicate the minimised values of E(n,+-7ng)a®/0? for the cor- 
responding values of 6, as far as Sukhatme’s Tables (Sukhatme, 1938), reproduced 
_ by Fisher and Yates (1957), of the percentile points of d, 5.0: 9 permit as to judge 
without interpolation. Thus, in Table 1, for ô= 10, E(n,--ng)a*|o? decreases 
from co when @ = 0° to a minimum value not far removed from.86.872 in the region 
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of 0 = 30°, and then increases to oo when 0 = 90°. If therefore there is some reason 
to believe that à is more likely to have a value of about 10 than any other value, one 
would choose 9 = 30° (about), i.e. one would choose values k, and k, for which a?k? 
= 20.052, а218 = 6.682 in order to minimise the cost and labour of experimentation 
(c.g. ifa = 1, И = 20.052, 18 = 6.682). Again, for ô= 10 in the second example, 
the choice of 0 would be again about 30° to minimise Z(n, --л,) а/0°, leading to a2k? = 
25.56, a*k$ = 12.85 (e.g. На = 1, 12 = 38.56, k3 = 12.85). [Note that the entries in 
the body of Table 2 are about double the corresponding entries in Table 1, so that the 
expected total sample size is nearly doubled when æ = 0.01 as compared to the pre- 
vious case when a = 0.05.] 


We now examine the efficiency of the estimation procedure somewhat more 
rigorously. For purposes of discussion, it will be convenient and fruitful to schema- 
tize the problem of the estimation of the means of two separate normal populations 


by categorizing three successive stages of ignorance, as follows : 


(i) Both о? and e$ are known. This is hardly likely to arise in practice 
since one is not likely to know the variability of the populations and yet not know 
their locations. 

(ii) The ratio 02/0 is known, though the separate values of of and оў are not 
known. (A particularly important case is when 07/08 is known to be 1.) 

(iii) Both c? and c2 are unknown. 
It is assumed that in all three cases the specifications to be met are given by the quan- 
tities 2a and a. 


For (i), the standardised normal variate 


BS а) би) 
oi, o 
( Ж ры. ) 
in the sense that it will minimise sampling, Here, 


n, and ng are fixed integers (there being as need for sequential sampling to obtain cont: 
— дз of fixed length when the о? are known). To meet the speci- 
en so as to satisfy the equation 


is the most appropriate one to use, 


dence intervals for /4 
fications, the values of т and n; must be chos 


a (5.12) 


( а.) 


fig 


ped аз 


significance point of Е. There is still one degree 


wh „ә is the two-sided 100%% EY 
ere шу is the tw T We should like to minimise the total 


of latitude available for the choice of nı and Ng. 
“number of items sampled, 
s. (5.13) 

т = 4-13, 
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subject to (5.12). The minimising values of n, and n, (using undetermined multipliers 
and regarding the п; as continuous variables) are 
2 


nt = (22) соо) (i= 1,2), 1205.14) 
whence the minimal (fixed ) total sample size is 
Е 2 
n* == ntn = (5:8) (04-03). ШЇ. (5.15) 


For (її), we use the sequentially conditioned variate 


(954—253) — (43 — ль) = 
E КА 
A pl 


os» ... (5.18) 


` where £j, -»,,is a Student variable with vy + voy = no-no — 2 degrees of freedom and 
sampling is carried out according to the scheme 


n = max (fpe), no) (i = 1, 2), 


with 8 = (Moy + qq—2)4 [2 a (Sav) | na] 
^ 2 
Ната aiaa] (а) т), E» 


88 being an unbiassed estimate of o? on Yor Yon degrees of freedom. It is easy 
to show, using the method of а previous paper (Ruben, 1961), that the left-hand 
member of (5.16) is in fact distributed in the limit, as 07—00, in Student's form 
with vortoo degrees of freedom. 


To meet the specifications, the scaling factors 1 [py and 1/p which determine 
(approximately) the average total number of items sampled, denoted henceforth by 
En, must satisfy the equation 


a 


e LEES fpa py afi ... (5.19) 
14. € 
(ata) 

and accordingly y Em~ (P+), à ... (5.20) 


provided only that oł is not excessively small. We should like to minimise Hn by 


the proper choice of p, and рз, subject to the constraint (5.19). The minimising values 
are 


р} = Putos аја ( Lp j, p ecran eft Hi AC js (5.21) 
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and the approximate minimal average total sample size is therefore 
Еп troi роз; a/2 2 
Мый ә сс oe = 629 


E | As for ease (iii), this has already been discussed at Ad length previously. 
a e the total average sample size by H,(n|6), it has been shown in formul: 
5.10) that 


d SEN 2 
Ёл |) ~ ( Meo ma ee ) 0? (cosec?0-1-8 sect), ... (5.23) 


pros ided that the 0? are not excessively small, where 0 is an arbitrary constant which is 
en tire ly at our disposal. For any assumed (suspected) value of д, say ô = ду, there 
existe a minimising value of б, say 0 = 0(0,) = б, i.e. a value of 9 which will minimise 
E,(n|6). (The values of 0) corresponding to various assumed values of д have been 
found numerically for two particular cases in Tables 1 and 2, as far as Sukhatme’s 
Tables permitted without interpolation.) Analytically, the 6, corresponding to 
0 — dy is a root of the transcendental equation 


E (dps vos; 0; a [2 (с08ес?0-- 6% вес?6)} = 0. s. (5.24) 


А fairly good approximation to 0, is obtained from (5.24) by neglecting the 


variation of d. Ка with 0 in comparison with the variation of cosec?0--0, 


sechÓ with 0. This is evidently justified if лу, mp are not too small, since 
dvo, ғоз: 0; а]. Бата for all 0, as Уор Yog >. 
Approximately then, 

f, = arc cot ду. (5.25) 


Observe further that the approximation (5.25) is particularly good if б, is 
This follows from the fact that when ô = 0, or when 


either very small or very large. 
6, = co, the solution given by (5.25) is entirely accurate. For, on setting д, = 0 


in (5.24), one obtains 


в; а]? cot 0 = 0, .. (5.20) 


д 
—— yy, роз; 6; @|2 — dro, 702; 


90 


which is satisfied by 9 = 7/2 on noting that 


д 
gg Фо: тоз ® al E (5.27) 


Similarly, on setting 6, = co in (5.24), one obtains 


e уо Yos 0; 9/2 droi, vos 0; 0/2 tan 


0 -= 0. ve (5.28) 
90 (e 
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which is satisfied by 9 = 0, on noting that 


д ~ 
00 dyes, 702; 6; uai s 0. КО (5.29) 


((5.29) and (5.27) follow from the fact that droi, roz, —0 and d», роз; 0 are identically 
distributed variates, as are also d», ғоз; (т/2) 0 and d», ros: (7/2)+0, from the 
definition of dvo, vo; in the form dros, изо = ти —cosbty,.. Therefore, 
dvo, роз; 6; a[2, considered as a function of 0, is symmetrical round 0 = 0 and 
9 = 71/23) 


Equation (5.25) enables the minimising values of k, and ka ki and k}, to be 
determined approximately, and thereby also the minimal average total sample siz: 
at д = ду, henceforth denoted by #3 (14%). 


In fact, from (5.9) and (5.25), 


jp Pos vosi secot asia рају, 
(5.30) 


ц = Ton Pen are ent гун ай нан) | 


so that  By(n|ô) = ( dros, Yos; are cot 83/4; а/2 ) нараа) © (5.31) 
(where 6 = 03/01), and, in particular, 


Ед) = ( ECT LETS ао ... (5.32) 


It is intuitively obvious that as our state of ignorance progressively increases 
from Case (i) to Case (iii), so does the cost and labour of experimentation, i.e. 


n* < Esn < Es(n|,), 2. (5.33) 


where à, has been substituted for ô in the expressions for n* and En. The author 
conjectures that 


(a) no double-sample procedure meeting the given specifications could 
improve on that given by Case (ii) in which à is known (ô = ô), at any rate provided 
only that sufficiently small values of о? are excluded from consideration, and 


(b) no double-sample procedure meeting the given specifications could im- 
prove on that given for Case (iii) in which 01 and оў are both unknown and the proce- 
dure described is carried through on the assumption that д = бу, if д is indeed equal to 


do, at any rate provided only that sufficiently small values of o? are excluded from 
consideration. 
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Below we present a short illustrative Table which gives the approximate 
average amount of sampling in all three cases for various ô. The particular pair of 
pilot sample sizes has been chosen as (7, 13) and the two values æ = 0.05 and « = 0.01 
are considered. The approximate efficiency of the procedure given for Case (ii) is 


* 
Effa = 1 == ба : 
: Jin s а[2 ) ; (nao 
independently of бу, while that for Case (iii) when ô = 9, is 
* 
BIRO) = луу = = : 
yn Е\(т | бу) | dros, voz; ате cot abl“; a2 ) ; d 
Note in passing that lim ЕЁ} (б) = ( EH js (5.36) 
8,20 биз]? 
апа lim Е) = Pee x 5.37 
8900 б) des wal De 


The efficiencies have been defined in relation to n* 
sequential or otherwise, 


an absolute yardstick. No procedure, 


stated requirements could hope to improve, 
sampling, on the procedure given for Case (i); 


TABLE 3. APPROXIMATE AVERAGE 


vor = 6, Роз = 12, 


‚ since the latter provides, as it were, 
which meets the 


in the sense of reducing average cost of 
whatever be the value of б. 


AMOUNT OF SAMPLING AND EFFICIENCIES FOR 
а = 0.05, 0.01, and for varying 8o 


5 п*аз]о1 Ез (п)аз]о1 Er(]aoe]ol | WEF% | HENI% 
.01 4.649 5.862 7.173 87 65 
| .030 10.02 16.17 80 50 
1 6.653 8.390 9.724 87 68 
11.49 14.35 21.70 80 53 
E Е s ie edi on 
І 15.36 19.38 21.17 87 73 
26.54 33.13 42.18 80 63 
5 40.23 50.73 53.46 87 75 
69.49 86.73 102.8 80 68 
рае 
10 66.53 83.90 86.87 87 77 
114.9 143.5 167.1 80 69 
2 ee Se 
100 | 464.9 586.2 587.1 87 79 
803.0 1002 1139 80 10 
л бы 
© 87 81 
=з нш es a оо © 80 т 
ЖЫНЫ ГЕШ Е Аа а 


(The entries on the left-hand side 


on the right to & = 0.01.) 


46,12; бо; 0.025 and 46,12; 60; 


We observe that in this particul 


a = 0.05 and а = 0.01. From (5.35), we see 
0.005 increase steadily with 00. 


ar case Effs 
that this is di 


of sach column after the first correspond to 
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ue to the fact that both 
However, this will not 
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always be the case. Fisher (1941) has pointed out that the manner of variation of 
the percentile points of the d-distribution may be considerably affected by the 
significance level (in our case, the confidence coefficient) on which one works. For 
example, 412,8; ø; 0.025 decreases steadily with 6, for 0 < 6, < 7/2; on the other 
hand, 412,8; 1;0.005 is а single-humped function of 6, in the range 0 < 6, < 7/2. 
Consequently Eff;(À) will not be a monotonic function of 6, when ny, = 13, ne — 9 
and « = 0.01. 


Бо far we have considered only the efficiency of the procedure when ô - бу. 
However, for а completely adequate characterisation of the procedure we need to know 
its efficiency for all possible ô. This will indicate how the efficiency falls off as à departs 
from the suspected value д, at which it clearly attains its maximum value. From 


(5.31) and (5.15), the efficiency as a function of à is given approximately by 


Effe(s) = E Éajo ja +8ф-Ч1--883)у-1 +8)? ... (5.38) 


Pols Роз; are cot г1/*; a/2 


The efficiency curve is single-humped. Tt rises with infinite slope at à = 0 where the 
efficiency is 


( а? 2 "аар ... (5.39) 


dvo, 202; are cot 83 ; a/2 


to the maximum value E ffs(59), as given in (5.35), at 8 = бо; it then falls asymptotically 
to the value ; 


ү И а ; 
dvo, 702; атс cot al/s; «[2 


J atapa. ... (5.40) 


It will be observed that the process is not asymptotically (тол, 715300) efficient 
uniformly (i.e. for all 2), but only for the single value ô = д. The reason for this state 
of affairs is essentially as follows. For Case (i), 


m sar ва) 
ni 


However, for Case (iii) our procedure gives, approximately, for the ratio of the two 
optimal average sample Sizes, 


BS 


a Boi. аан 20, = 0053, ... (5.42) 
3"1 


Thus, the left-hand members of (5.41) and (5.42) are equal if, and only if, à = 4). 
There is a degree of latitude available in the fixed sampling size case (when the 0? are 
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however, we relate the efficiency of the procedure for Case (iii) not to the absolutely 
optimum procedure for Case (i), characterised by the choice of sample sizes n, = л], 


na = 3, but rather to the sub-sequence of repeated experiments for Case (i) defined 
bs 


Ny _ Em _ 66: RITER D) 
тт E D (8.49) 
then the efficieney is now 
р) (вает Ке 3 (6 м. (5.44 
//]3(8) ( Е т) iffs (80) (5.44) 


for all à. This follows from (5.12) on substituting 


Me 065, ... (5.45) 
т | 
for (ШЙ n= а (в J‘ oi ++), 2. (5.46) 


and comparing (5.46) with (5.31) the ‘conditional’ efficiency is verified as that given 
in (5.44), no matter what is the true value of д. 
We illustrate further by a particularly important example. Let my = n, 
поз, and suppose that there is external evidence for believing that ô may be not far 
removed from 1(0,— 1). Now since dvo, vo; (7/4) —0 and dro, vo; (ar/4)+9 are identically 
distributed, dvo, vo; 0; ајә, considered as a function of 0, is symmetrical round 
0 = п/4, therefore 


д d 

ET] d, vo 0; al2| o = т = 0 
where here vy = ж—1. 
Also, 2 (совес?0--вео?0) | g _ та = 0. 


Consequently, for 6) = 1, (5.24) is satisfied by 0, = 45°.* үш that um abe 
45°, as given by (5.25), is the exact value of the root.) On referring to (5.30), k i = k3 
=k* (say) , where 


j*— à (отар : 2. (647) 
From (5.31) and (5.38), we have (again accurately in this case) 
о, 2 
By (mjo) = 20$ (Perae) (1-0), e. (бав) 
: бар) га) ца 6%), 2. (549) 
апа їй Ө) = 4(4 8) 0974 


а series expansion (Ruben, 1960). 
*The distribution of d р, РК т |4 has been found elsewhere as i i 
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Eff;(6) is graphed in Figs. 1 and 2 for various values of ny and for x = 0.05, ж = 0.01. 
In particular, $ 


m=). =. (60) 


dro, Ро; 45°; aj2 
giving the maximum height of the efficiency curve. The procedure is asymptotically 
efficient (ny—00) if à = 1, but not otherwise. However, the procedure is fairly nearly 
asymptotically efficient in the range 0.1 < à < 10: in this range the efficiency is 
asymptotically not less than about 79%. 
In relation to the subsequence of experiments for Case (i), in which as from 
(5.43), n/n, = д, the efficiency is, by (5.44), 
2, 2 
Bie) —(. te» — Y' руч). ‚.. (5.51 
О ван — АИ (5.51) 
Again, for purposes of comparison with Case (ii), note that when д = 1 (on using 
(5.21), (5.17) and (5.18)), 


Рі = р* = py ... (5.52) 
with p* = 4/2 ta(no—1); aj2]a, tee) (5.53) 
and ту = max ((p**s5), ny) = No, ... (5.54) 
ТАВЬЕ 4(а) TABLE 4(b) 
*conditiona | "conditional 
efficiency no—1l efflcieney 


"Eff$(1)95 ЕЙ (1)% 


a= 0.05 а = 0.01 


with 


1 2 no 1 no : no E 
88 = У. : д 5.55 
0 5(жу—1) Ё E27 (Е zy) +(2 2) )] з ... (5.95) 
Furthermore, 


Ef-( o 5m y ... (5.56 
dis cay sal ч о 
which is to be compared with (5.51). "This procedure (for Case (ii)) is now а member 
of the class of procedures diseussed previously (Stein, 1945; Ruben, 1961) where the 
sequential testing of the general linear hypothesis and the concomitant estimation 
problem are considered.* 


кусы should also be made to а paper by Chapman (1950) in which an analogous two-stage 
procedure is developed for the confidence estimation of the difference in means in a rather special case. 
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EFFICIENCY CURVES: noj7ng7ng5i 420,05 
2,2 
айк = 209. 1, jn13h5°30.025) marie cae 


— 8 (logarithmie scale) 
Fig. l 


EFFICIENCY CURVES. 00121002 4 = 0,01 
2,2. o 
2 kelin ano sl;hs 30,005 


edged’, 


d= 
o 


o 001 «01 * 
| — 8 (logarithmic scale) 
Fig. 2 
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6. THE TESTING OF THE DIFFERENCE IN MEANS OF TWO UNKNOWN 
NORMAL POPULATIONS 

For convenience, denote и, —4 by 7. We wish to test 7 so that the power 
function is independent of the unknown 07. For this purpose, the two-stage sampling 
procedure (тоу, noz; kr ke) will be used. 

Consider then the problem of testing the simple hypothesis Hy : 7 = o against 
the simple alternative H, : 7 = 71, with 7, > 70. We assume that the risks of the two 
types of errors are specified in advance as æ and 2. ‘The rejection criterion 

R:2,—2, > О ЭС). 
will be used for H,, where the five constants nor noa: kr, ka and C have to be chosen 
so as to satisfy the imposed requirements. From (5.3), the limiting power function 


of the test is : 
G= 
1— Урок, роз; 0 Im ... (6.2) 
where Урон, роз; 0(:) is the distribution function of d»;,, ро; o. We therefore require 
0— x 
1— ЕВР РП f 1 в — a E. (6.3) 
ate 
Om! 
and Yoon Z 1 y] = В, ... (6.4) 
when: 0— s 
p: pm = dro, vos 0; a ... (6.5) 
(i +g) 
and o 5 
п! =— dros, vos 6 В, (6.6) 
вв) 
or, equivalently (recall that tan 0 = kalk), 
(C—1o)ky = dvor, воз; 6; æ cosec 0, i (6.7) 
and = (C—m)ky = —dyo,, vos, 6; 8. cosec 0. ... (6.8) 
Solving (6.7) and (6.8) for C and №, 
. C= Фрол, тозза 1-Е Чрол, vo::6:8. No . (69) 
AT [n rr 
[E (dvor, роз; 0; a-+Ayo1, воз; 6; В) cosec 9 _. (610) 
71—10 
во that [icm (dvor, роз; 0; адроз, роз; 6; В) sec O Е 2. (611) 
VES eee T 71—70 


For given о, В, 70, M, the constants о, уз, Ё and №, may be chosen arbitrarily, pro- 
vided only that the latter satisfy equs. (6.10) and (6.11). For each such choice the 
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rejection criteria is provided by (6.1) amd (6.9). Tf further the pilot sample sizes are 
selected in advance then there remains only to choose 0, and k,, k, are automatically 
determined by (6.10) and (6.11) after 0 has been chosen. For all such 6, the specified 
requirements for the risks of error will be met. In practice, one would, exactly as for 
the sequential estimation procedure, choose the value of @ which will minimise 
En ~ 1802-1808 when д = o3/o% has some suspected value. . 

The power function in (6.2) is monotonic in z;—j5. This implies that the 
present procedure may be used to test the more realistic composite hypothesis Н: 
у < 0 against 7 > 0, when it is specified that the power < a for 7 < some value 7o 
of (J) < 0), and > 1—f for 7 > а Value 7, 01 (— = 9 > 0), as well as to test the 
hypothesis Ну: = 0 against one-sided alternatives 7>0, when it is specified that the 
probability of incorrectly rejecting the null hypothesis shall be c and the probability 
of correctly rejecting shall be > 1—f for 5 > M, 7, being some suitably chosen 
(positive) value of у. For the first of these two cases (6.9), (6.10) and (6.11) reduce to 


рол, роз; 6; а roi, voz; 0; B 6.12) 
[£i 01» Роз; 0; , е 21206: 
la 792, 0, & -Е@әо, vos, 0, В | x 

ki = (0, vos 6 «ҮЧ; аен SE ЕД $3, SONG (6.18) 
ky = (d, vons аб, В) 800 б?л» _. cose (6.14) 


while for the second case they reduce to 


‘= удз, 702; 8; a А e an (6.15) 
© p 7055 6; a-vor, оз 8; | n 


k, = (dron, ron 0; advor, voss 05 В) 60866 Alm, (6.16) 
Tey = (йул, vos 8; a-vor, voss 6; В) Вес Өң. (6.17) 
Table 5 allows these tests to be applied for o = 0.025, В = 0.005, % = 7 


and mg = 13. The entries in the rows labelled (7,—%)*k? and (mo) have been 
ото Я from (6.10) and (6.11), respectively. The italicised entries represent the 
minimal values of (y,—7,)2En/oi, as far as Sukhatme's Tables permit one to judge 
without interpolation, for the corresponding assumed value of б. Тав ж Po TOWS 
have been inserted to enable C to be read off directly for the tests of Hy and Hy. 

To illustrate the use of Table 5 numerically, suppose we wish T test Че ye 
thesis ш, < дь against the alternative /44 > /!s- e Room i is оре 
i ili jecting the hypothesis tes 
if ьи < —1 the probability of rejecting the лу 
0 025 d Ищи, > +1 the risk of ‘accepting’ the кр КАН ne 

ae i izes have been chosen to be 7 and 19. 

be greater than, 0.005. The pilot sample sizes : us 

is extraneous evidence available that ô is more likely to be 1 than vd = ү F 

The appropriate test is that corresponding to (6.12), (6.13) and (6.14) with voi , 
ү a = 0.025, В = 0.005, jy = —1, 7, = +1. According to Table 5 the aver- 

X = =. Н 1 E $ * b + "n 

i total sample is minimised by choosing 0 = 45? (about) This gives a, a 4th 

rows of the Table) 48 = 61.56, 4H = 61.00. Albo, for f ^ DU, as 

—0,1705 (penultimate row of Table) and the hypothesis is rejecte 1—2» > н 


Im. 
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TABLE 5. THE CHARACTERISTIC CONSTANTS FOR THE TWO-STAGE SAMPLE TEST 
when а = 0.025, В = 0.005, noi = 7, поз = 13 


———— MM 
9 
pcc | 0 15 30 45 60 15 90 
= DE: 
de 19:9: 0.085 2.170 2.193 2.239 2.301 2.908 2.494 2.447 
de, 12; 9; 0.005 3.055 3.055 3.105 3.247 3.455 3.638 3.707 
(m —9)2k3 o 41.1 114.2 61.56 45.21 39.40 37.87 
п 27.40 29.51 38.08 61.56 135.6 548.6 20 
(n —,)?En[oi 
for § = v 411.4 114.6 62.18 46.51 44.89 © 
(m —7 поз 
for § = 0.1 o 414.1 118.0 07.72 58.77 094.26 © 
а аа: 
for § = 1 о 440.6 152.3 123.1 180.8 588.0 © 
x —,):En[c 
=5 © 558.7 304.6 369.4 793.29 9782 © 
(my Tub A 
for § = © 106.2 495.0 677.2 1401 5525 © 
ОН 
for § = со 4362 3922 6217 13610 54900 oo 
dece —0.1073 —0.1043 —0.1620 —0.1705 —0.1803 —0.2020 —0.2017 
ja E 
7 т 0.4163 0.4179 0.4190 0.4147 0.4066 0.3999 0.3976 
ha 5 


(dg;o == 10,12;;0.025, dg; = 46 12;0,0.005) 


If ô is indeed 1, then (7th row) Е(п) = 123.1x61/4 = 30.7802. If on the other 
hand д = 0.1, then (6th row) En = 67.72x 03/4 = 16.9302, and if à = 0.01, En 
62.18x 01/4 = 15.5501, i.e. if ô < 1, the average number of items sampled is Wee 
less than we have estimated. On the other hand, if à > l, the reverse is the case, 
eg.if à — 5 instead of à = 1, then (8th row) En = 369.4x 62/4 = 92.3562. 


As a second example, consider testing the hypohesis x= из against the alter- 
native / > Из. The requirements to be met are that if indeed и, = м, then the risk 
of rejecting the hypothesis is 0.025, while if it is false and  —j,, > 1, then the risk 
of ‘accepting’ it must not be greater than 0.005. We suspect that ô = 1, and have 
(as before) chosen pilot sample sfzes 7 and 13. Неге, w, = 6, м» = 12, æ = 0.025, 
В = 0.005, 7, = 0, 7, = +1, and the appropriate test is that corresponding to equs. 
(6.15), (6.16) and (6.17). The minimising value of @ is again approximately 45^, for 
which łk} = k? = 61.56, 218 = № = 61.56. Тһе value of C is 0.4147, and the ER 
thesis is rejected if 2, —z, > 0.4147. 

From (6.2) and (6.9) the limiting power function for TEMPER tests of 7 = 70 
against the set of alternatives Й > 1, оп significance level x may be experessed as 


1—0», роз; | dra, Саа ў ... (6.18) 
ru 
UR в) 
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The value of Qn corresponding to the region R in the (#,2,)-plane defined by (6.6) is 


+ 
1 1 
(ate) yes, von 3 0; @ — (7—1) 


TCR ; ... (6.19) 
qum 
nie 
and the true power function is therefore 
: + 
1 
e в) Фрол, роз; 0з —(N—No) 
eo LÀ i 
1-х È ры Pp, (090 : 
mi-l тә=1 oi o2 
(2+9) 
ту Ma z 
(6.20) 


On is strictly increasing or decreasing in both о; and ту according as to whether 
1) — i], is less than or exceeds 4/(1/K?+-1/K3)dvq1, vo» ; 0 ; a, and is equal to $ when the latter 
two quantities are equal, for all c; and ay. Consequently, the true power curve lies 
below or above the limiting power curve according as to whether the first or second 
inequalities hold, and is hardly affected by variations in the c; at the critical value 
of у. In brief, the test is conservative and is actually somewhat better than stated. 
Similarly, to test 7 = 7 against 7 4 70 on significance level æ, the hypothesis 
tested is rejected if 
[ав [Sues ... (6.21) 
Let 
leta 
This controls the risk of error of the first kind for any pair of chosen values of К, and 
ky. The limiting power of the test is 


17—10 
1—J voi оз: 0 | dvor vosi 0; а — т 
ита) 
= ve (6.22 
у, ros 6. | aron vos 8 98 — 74 2 Т (6.22) 
(ate) 


cal round d = 0, the 


ji i i -statistic is clearly symmetri 
Since the density function of the d-statistic is clearly sym | and the test is un- 


limiting power function in (6.22) is strictly: increasing in |7—%o 
= procedure for satisfying the additional imposed conditions pe the id 
i i nti 

curve passes through the points Mm} 1—f), where р is some n dn fy ana 
and 1 is a given positive quantity is straightforward, though rather tedious, 
in practice. То meet the specifications, we require - : 

В = Vera, vosi 0 [dross vos 6 aj2—lk, sin 1 5 Fes) 

— Pros vozi [— dro: роз; 6 a|2—lk; sin 1. .. 
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The right-hand member of (6.23) may be tabulated or graphed as a function of И, 
for any chosen value of 0, with the aid of Fisher's extended Tables of the distribution 
(1941). It equals 1—0; when Jk, = 0 and decreases steadily to zero as lk,—00. ‘The 
value of hk, = lki (say) corresponding to a value р for the function may thus be 
found, and hence also the appropriate value of lk, = Из = lk; tan 0. The test is 
‘carried through with these values of k, and ky. If desired, this procedure may be 
repeated for various values of 0, and that value of @ finally chosen which minimises 
En for some suspected value of д = 03/01, as previously. 


Finally, the value of Qn corresponding to (6.21) for the two-sided test is 


1_Ф [29+ рен, vos; 0; md 
(o[m,--o$[m,) - 


- | CURL RE) a, ros ваз —(1—10) ] ve (6.24) 
(oim, -os[m, 
and the true power function is 2 М 
= Ae E = 5 (1/1 4 - И, voz; 0; a2 —(—)) 
1-2, ada Pm (OD Pm (D O [ (cilm, от | 

(Е), von 3 03 ад — (7—0) 3.95 
—Ф o 0 КО (0.20) 

[ (отт -os/my,y* | | 


Ав for the one-sided test, inspeotion of Qn shows that the test is conservative. 
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А LAGUERRE SERIES APPROXIMATION TO THE SAMPLING 
DISTRIBUTION OF THE VARIANCE 


By J. ROY and M. L. TIKU 
Indian Statistical Institute 


SUMMARY. The first four terms of the Laguerre series expansion of tho distribution of the 
variance of a sample from any population are worked out. 5 : 


1. INTRODUCTION 

An approximation to the sampling distribution of the variance in samples from any 
non-normal population was given by Gayen (1949). He started with a Gram-Charlier series 
expansion of the probability density function of the population and ignored all cumulants 
of the population above the fourth and also squares and higher powers of the fourth cumu- 
lant. An alternative approach is presented in this paper. The probability density function 
of the sample variance is expanded in terms of a Gamma density function and Laguerre 
polynomials and the coefficients of the first four terms are worked out explicitly in terms of 
population cumulants, of upto the eighth order. Gayen’s expression agrees with this expansion 
to the order of approximation used by Gayen. 


Laguerre polynomials. For m > 0, a Laguerre polynomial of degree 7 in x is 
defined as (see Szegó (1939), Chapter V) 


t 
* т — 
део = 3, op GP, (1) 
(m--t)(m--t--1) ... (m--r—1)(r—)), for t= О, Darr te Sie 
where om = { n e (s 
forr—0,1,2,... In particular, the first four polynomials are : 
Lg” (2) =1 
I? (ж) = т—® 
(1.2) 


1 а? 
Ig (a) = а m(m4-1)—(nd-1)9 1 

aa? 
ne = Д тт-Е1т+-®)—аү (вв) (0-9) эү} 


Iq? (а) = А тт ун-т) 31 (m-+1)(m+2)(m-+3)2 — 


2. д5 уай. 
ih: - (m4-2)m-3) от (ME) gp Far 
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If we write 


Ur аж 1 
EZ =н) ° 2" i (13) 
for the Gamma density function with mean m, the orthogonality property of Laguerre poly- 
nomials can be stated as : 
I . 0 ii fs 
S LE (а) (рта = ... (L4) 
Е QU? if r=s 


2. APPROXIMATE DISTRIBUTION OF THE CORRECTED SUM OF SQUARES 


Let Yj, Y, ..., Y, be a random sample from a population in which cumulants of all 
orders exist, and let the r-th cumulant be denoted by K,, = 2, 3, ... . The variance of the 
population will be alternatively denoted by c? = K, Further, we shall write 


ККИ з=, О ... (21) 


Consider the sum of squares about the sample mean : 42 = E (Y;—Y)? where Y = E X Ү;. 
4-1 i=l 
This is said to have v = (%—1) degrees of freedom. We shall write 
X = 83/20? i (2.2) 

and try to derive an approximation for the probability density function of X. 

We first note that the cumulants of the distribution of s? = S?/v has been worked out 
by Fisher (1928), and that the first few are tabulated in Kendall (1947). Using these, wo got 
the first four central moments of X and these are listed below : 


E(X)—m 

=, m? 
HX) = ТЕТ № 

=: 6m? m(2m—1) m? р 
Is Xy тет А+ iml Mu p^ (2.3) 
A) = тта) РАН В), nO D ap Tm 

m(3m>+ 16m?—2m--1) 4$ , 8m 2m—1) mi 
temp ame Мирр № 
where we write for simplici ET 
plieity m= a). ... (2.4) 


Let us denote by ó,(r) the probability density function of X. The quotient 
$m(%)[Pm(@) can be formally expanded in an infinite series in Laguerre polynomials as 


т) _ ® т т) 
тит = a(m Lp). .. (2.5) 
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Multiplying both sides of (2.5) by L("X(z) p,(z) and integrating over æ from 0 to оо, we get 


a =F Lia) 001098 ею) 


formally, by virtue of the orthogonal property (1.4) of Laguerre polynomials. For conditions 
of convergence of the formal expansion (2.5) see Szegó (1939), Chapter IX. 

What we seek here is an approximation to ф(х) using only the first four terms in 
(2.5). Thus 


mle) ~ Pala) [3 «f? ЦА] e @л) 
where af NS т —0,1,2, 3,4. ‚.. (28) 


To evaluate Z[L(?(X)] for r = 0, 1, 2,3 and 4, we note that writing y = «—m, the first 
four Laguerre polynomials given by (1.2) can be expressed in terms of y a8: 


Im = =y 
m) — 1 РЕН КА (2.9) 
I$ =“ Van 
2 i 
Lyn) = Я Lt (1n) y— т 
1 
Ly) = ar- ijt (6—т) yu (im -1) y+ gen 9). 
Hence EPX) = 
(2.10) 


udo = "m т 
1 2 
muon] = — Laeta ™ 


ТОЙ d PR ath arti 

EUPX)l = аў lsti (6—т)лә+ 5 т(т—6) 

where д», Jis, ил ате respectively the second, third and fourth central moments of X given 
by (2.3). Using (2.8) and (2.10), we finally get 


agh EN 1 
а") m 


2.11 
a£ = ( ) 


GncED ED 2 


4 ES 

a = E neri mF | 
8m(2m—1) 

M de uro Mw 


aj" = 5 анс DS o: (2m. 


3m3--16m?—2m--l 35 
T Qm) a] 
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^ 1 
The probability density function of the sample variance Z — E - = is then 


obtaired as 
map, (mo-?2) È ат) (то-22) .. (2390 | 
т=0 


If terms involving A, for r > 4 and АЗ are ignored, this agrees with Gayen's (1949) 
formula (3.1). 
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MAIN EFFECTS AND INTERACTIONS* 


By HENRY B. MANN 
Ohio State University 
SUMMARY. Expected yields of observations, for an arbitrary region of factor space are 
decomposed into main effects and interactions of the various factors present. Methods to test the signi- 
ficance of the contribution to the highest order interaction from а sub-set of the factor space are developed 
for the cases when (1) some a priori information about the interaction is available, (2) there is no a priori 


information, The test used here is shown to be more powerful than the test for the highest order inter: 
action in the entire region. 


1. INTRODUCTION 


In a factorial experiment on two factors we apply each factor on varying 
lovels to various experimental units. We shall assume that this application yields 
for each unit a quantity which we shall call the yield of this unit. We denote by 
f(x, y) the mean value of the yield obtained, when the first factor is applied at level 
x and the second factor at level y. The levels may be capable of discrete values 
only or they may vary continuously. 


Intuitively one feels that the function f(x, y) should be broken up into a 
general mean и, an effect g(x) of the first factor, an effect Му) of the second factor 
and an effect u(x, y) ascribed to the combination of level x of the first factor with level 
y of the second factor. The function g(z) is to denote the deviation from the general 
mean due to the application of the level = of the first factor. The function u(x, y) 
is to denote a deviation from the mean value of f(a, y)—9(c) when y is fixed, Accor- 
dingly we should like to impose the restrictions 


Гос) а) = f hly)dy = J ulz, у) = f ute, yy = 0, es (11) 


where the integrals have to be replaced by sums if the levels are discrete and throughout 
ей over the domain of definition of the 


this paper each integral (or sum) is extend. f 
integrand (or summand) unless otherwise stated. We then have the decomposition 
fle, у) = и gla) + hy) ще, у). (1.2) 


p and a(z, y) are estimated by least squares. "This 


The quantities g(x), МУ), 
by the condition that 


amounts to defining g(x) and Му) 

ff fts) uge) hy) dyde 
under Contract No, DA-11-022-0RD-2059, ‘Mathematics 
Wisconsin. 
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(1.3) 
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should be minimized by the choice of the quantities p, g(x), h(y). Differentiation with 
respect to + and with respect to g(x) on each y level and h(y) on each v level (or alter- 
nately a simple application of the calculus of variations) leads to 


JS FŒ, у) и—9()—Ку))дуах = 0, 
S (Fæ, #)—#—(@)—Му))йж = 0, 
ЛО, #)——(&)—Му))Фу = 0. ve (1.4) 
^ Now assume that f(x, y) is defined for 0 < x < а, 0 < у <b. Then (1.4) and 
(1.1) lead to the unique solution 


pma] | уйгу, oe) = È $ fes yn, 
9 0 0 


Му) = + Í fle, y)dx—p. (Аб) 
ao 


Tf the levels are discrete the solutions are obtained by replacing integrals by 
sums, 


In the following we shall restrict ourselves to the case that the levels are dis- 
crete, but many of the statements and proofs can easily be modified to apply to conti- 
nuous levels. 


The idea can easily be generalized to any number of factors. We introduce 
the decomposition : 


s СА 
Ха, ...%) = У д т О (1.6) 
а=01,...п а , уй 
where У means summation over all choices i, <... <i, out of 1,..., and the 


ур 


DD " 
quantities ГА 5 satisfy the conditions 
eC premi 


СЗЯ 
У == д0, io ety А Gs. 
ag о мы бе j B x (1.7) 
(In (1.7) all a’s except ag are fixed and ag runs over all values for which the summand 
is defined.) That the decomposition (1.6) is always possible will follow from 
Theorem 1. 


The quantity pj, is called the main effect of the a-th level of the i-th factor. The 


С a ; f 
quantity n \ > & > 1, is called the interaction between the levels ал, ..., a, of the 
1 а 


factors i,,...,%a. The totality of all DN S for all possible values a, .:., a4 is called the 
‘Ls +, Oa 
interaction (i, ..., ta). 3 
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А set I of interactions will be called a natural set of interactions K (боа), 
(RES (зз <= в) implies (js, ..., Jg)el 

Theorem 1:. Let S be amy set of points (a, ..., @„), let Sas... an be a June- 
tion defined on S and let I be a natural set of interactions. If the equations 


Tas, ...,4n = X mee "po (1.8) 


[e 
1 EL 


Gay oer 54 ТИ; PIE 
have a solution for m. “ then there also exists a solution is ч satisfying (1.7). 
а а... Oa 


As see) ба 


А UD БАУДЕ 
The solution h * © is however not necessarily unique. 
01, «e Ue 


Proof: For а = 0 in (1.1) there is nothing to prove. By induction we may 


assume that in (1.8) all т. re satisfy (1.7) for a < и. For simplicity assume 
Qis.» sta 


1,...,weZ, Consider the quadratic form 


ч-1 бз, 00 56, 
Q- X. (m NER op ADU j- (1,9) 
a1 а Qisu а=0 1и ©й n 
üst 
If we minimize Q with respect to the parameters ү ^ we get among 
ditt, 
others the equations à $ 
5 EU i Ни C x pe \=* d (1.10) 
bi ars... s Gina bi ipu Ou а=0 ET ате. 


f» d dle" Since Q: 
resulting from differentiation of Q with respect to y SET RUE Fat 
By imd we may assume that the 


has a minimum (1.10) must have à solution. 


3 iet А 
solutions V 77 — satisfy (1.7). We now put 
а, LES 
i da 
u u-i 
me x ти" И ADEM ү” д 
аз, -e Pu vp Gy а=0 1,44 Hs POPE 
ме е зле ү үнс чор QS a es (1.11) 
а уйы Gis «5 Aq Gi, -- ба ia [u 
"m kata atn for ig > U. 
[IE СЕА 
dis e ба 
= yom ; 
Clearly и, т БЕЛ; 


еге me Чай! ae no fa о < и, satisfy (1.7). We can apply the same process 
3, «5 Ou Qis e Oa 


in Г. Hence the theorem. 
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Corollary 1: The representation (1.6) is possible for any set S and any function 
Xa, Л defined in S. 


For we can simply put Des cn Х, Р т" To = 0 for « <n and 
bu 


apply Theorem 1. 


If Xa, ..., an is defined in the region T = {1 За & t, ..., 1 < a, < Ш then 
(1.6) and (1.7) have a unique solution 


бз, s f 2 TOES 
n S pues xe ... (1.12) 
aye 18-0 isosta pres в 

Bi, А f 
where X В is the mean of all values X with 5, =a... b, = ав. 

iier bisses bn kı [Л 

В 

For the derivation of this formula and more detailed diseussions, see Mann (1949). 


The main effects and interactions change if the region changes in which the 
function f(z,..., z,) is considered. In a given experiment this region depends on 
our choice of levels of the factors and we shall therefore call it the design of the experi- 
ment. We can give the main effects and interactions an absolute meaning if we 
consider a fixed region T' and consider only designs from this region. One might 
contemplate procedures for choosing a design or a sample of designs which would 
permit unbiased estimation of the “true” main effects and interactions (i.e. those 
arising from the region T). However, this is not the goal of the present investigation. 
The fact that the main effects and interactions depend on the design seems to detract 
from the value of factorial experiments and of the decomposition (1.6). However, 
there is often considerable interest in the values which the main effects and inter- 
actions take in a particular design irrespective of their “true” values. There is more- 
over a great deal of valuable information to be extracted from the way in which the 
interactions change when the design is modified. 


To illustrate this point consider a nutritional experiment in which various 
levels of carbohydrate and of protein are fed to animals and their weight gains after 
some period are recorded. We assume that all carbohydrate levels administered 
include sufficient amounts to supply the energy requirements of the animals. Let 
us make the simplifying, but approximately correct, assumption that there is a certain 
amount p of protein necessary for the proper growth and upkeep of the animal and 
that any additional amounts of protein will be utilized in the same way as carbo- 
hydrates are used. Thus amounts of protein exceeding the level p can be substituted 
by proper amounts of carbohydrates. On the other hand; if the protein level is less 
than p we assume that increasing the carbohydrate level will have no effect on the 
animal. (This may be an oversimplification, but it is not the author’s intention to 
' write a paper on animal nutrition.) Under these assumptions, if all protein levels 
administered exceed the minimum level P, we shall find no interaction, because protein 
and carbohydrate can be substituted for each other. On the other hand, if all protein 
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levels are below p, then the weight of the animal will depend only on the amount of 
protein fed, the carbohydrate effect will be zero and again the interaction will be zero. 
We shall find an interaction only if the protein levels include levels above and below 
p. Thus the dependence of the interaction on the design reflects facts which are of 
basie importance in animal nutrition. On the other hand, if we carry out an experi- 
ment including high and low protein levels and compute the interaction from this 
experiment, we may find high interactions in all experimental units and it may not 
be possible just by “looking at the дайа” to find that the interaction arose from the 
inclusion of the low levels. It seems therefore of importance to develop procedures 
which will help us to locate the levels whose inclusion gives rise to the interaction. 


2. TESTS FOR INTERACTIONS 


As in many other applications of the analysis of variance, two situations have 
to be distinguished. We may (Case 1) feel justified in assuming that interaction, if 
it arises at all, will arise from the inclusion of certain specified levels, known in ad- 
vance; or we may (Case II) not be in possession of such a priori information. We shall 
develop methods for each case. The computational problems in both cases are similar, 
but the applications and interpretation of our statements are different. 


Case I fits completely into the scheme of testing hypotheses. Let T be the 
total range of our experiment and let S be the subrange suspected of giving rise to 
the highest order interaction. Our Assumption A then is that all highest order inter- 
actions are 0 in T—S and we test the hypothesis that these interactions are 0 in Т. 
(It follows from Theorem 1 that the absence of highest order interactions in 7' implies 
absence of such interactions in 7—8.) In formal language our linear hypothesis 
reads : 


АЩ и ра ETIN ve (84) 
" a=0 l,o N Шр ә, 
"1 е 
H: Ежа, ..,an)= E И А in T. 


а=0 1,.., 9 Ел NÉ 
The test proceeds along the lines given by Mann (1949, pp. 23 ff), The rank 
of the matrix (g, ) of eqn. 4.6 of Mann (1949) must however be found first and this 
ба 


is not always easy. у 
'The procedure is as follows : Consider а matrix whose rows are numbered 
by all combinations (d, ---, Un) occurring in 7—5 and whose columns are headed by 


all symbols inosia <n—1, such that at least one experiment occurs in 7—8 
bi; о ba 


for which the factors i, ...,4, are at levels 5,,..., Ба. Write 1 into the intersection 


[EUM 
of row d,,..., а, and column P UR 


Let p be the rank of this matrix and N the number of observations in T—S. Assume 
N > p (otherwise no test is possible, unless an independent estimate of the variance 


is available). 


if ay, = 6, :.. 4, = b, otherwise write 0. 
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Minimize Q = E... € (Za... an —Ё(ха,,...‚,ан))# under the assumptions A, 
а ат f. 

where the sum is extended over all combinations ау,..., а, occurring in Т. Write 
Qa for the minimum value of Q under А. Since É(za,,..., an) is unrestricted in S, 
Q, can be found by minimizing the sum of squares extended over the combinations 
іп T—S only. 

Find the rank s of a similar matrix for the space 7' and let Q, be the minimum 
of Q under the hypothesis H. Let N; be the number of observations in T. 


The assumption А imposes N—p independent linear restrictions so that ©, 
has N—p degrees of freedom. The hypothesis Н imposes N,—s independent linear 
restrictions, hence N,—N—s-+-p additional linear restrictions. Hence, if xo, ...: Gn 
are normally and independently distributed with the same variance and if H is true, 
the statistic 


DEN p. ы eek "ECT 
N,—N+p—s8 Qa 


to 
= 


has the F distribution with N,—N-F-p—s and N—p degrees of freedom. 


In experiments where more than one but an equal number of observations 
r are taken in each subclass one can obtain an estimate 6? of the variance from tho sums 
of deviations of each observation from its cell mean and this estimate has (r— 1), 
degrees of freedom. The quantities Q, —Q, must then be computed from 4/7 Z;,, ..., a»; 
where the Za, ..., an are the cell means. ў х 


To test the hypothesis Н under the assumption А one then should use 


y N—p+(r—1)N, Q,—Q. i ... (2.3) 
N,—N+p—s8 (r—1)N,F+Q, 


In this case or any other case where an independent estimate ô? with say 
V degrees of freedom is available one can also test the contribution of the region S 
to the highest order interaction sum of squares by using 


pa y Q,—Q, р 
F ECNYs VB s. (2.4) 


The statistic (2.4) will be more powerful than a test for the highest order inter- 
Vus in Т, if these interactions are low in 7—8 compared to the interactions in Т, 
partieularly so if T— S is large compared to S. 


"Phe procedure is similar if the assumption consists of the statement that | 


certain interaction sets J, are 0 in T' and some additional interaction sets J. are 0 in 
T —8 while H states that the interactions of Г, LJ J, are 0 in T. “It is of course neces- 
sary that H really implies A. It follows from Theorem 1 that this is always the case 
if the complement of 7, U I, is a natural set of interactions. 
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к ои» 10 
S. . Without loss of generality we may in this case assume 


pl <a, «tj, 8— fa, ..., a3 че: 
linear hypothesis (2.1) in this сазе. The value of Q, is 


> йу зө As 
Talp 
р а=0 1,..., n Жу Ye 


E Wee = a E it "a 7 0. ... (3.2) 


j UTE dy өзб 
x in T--8 we may 
square estimates 4. к for xd 
E E I {т @1) 


ао nan ipe tg 


from өл. We then. have 
T 


5 Mann (1949, eqns. 5.5 and 5.9). SEQ, ve ae bs 
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where we substitute on the right the values of (3.1') if the combination а,, ; 

1 a 
occurs in 7—8 and 0 otherwise. Clearly, any set of values which solves (3.4) will 
also solve (3.1’). 


The equations (3.4) have the unique solution [Mann (1949) eqn. 5.5]. 


@ P555 b 
censa бу уто. B (3:5) 
В=0 i i 


isenta po nh Tkg 


Now let 0, < sp sbu S s, be fixed, зи: Зы, 8 <, Spa = lii cs Sn = le 
(We permit any rearrangement of the indices and it is permissible that з, = t, for ` 
v <u.) Then if a; > s; for at least one value of a; we have 


=g% 


В... би» Gusts ар bis ..., bus аш» e в 


uis ад ent 1, ..., 5, u4-1, ..., 0 s. (3.6) 


and therefore from (3.5) 


l, ses t ttl, eeel l, ..., 9, utl, ..., 0 
z 705 
bis s bus Gust» ee Oy bis sess buy Gusts es 0 
u 1-и-1 EiS, ЖҮЛ.) г 
(Тава Y X у DN... (37) 
B=0 а=0 Vp top wl «ot бл Bg, бета; 


where the second double sum is extended over all choices i, <... < tg, jy <... < Ja 
where i; ...,ig are chosen from 1, ..., м and jy, ..., ја from u-F1,..., l. On account 
of (3.6) if a; > s; for all и < j < l this may be written 


gir Я ПН шнш ры hus m Сав, Ў (—1)—5 x. $15 +0) $8 
b, ..., bu, Gua, s+ а, bis sss bus Gusts oes Cr 1, bi pu big 
и l-u-i i ip, J р 
ЖЕКЕН SUL DM EI E ela 69 
B=0 a=11,...,u utl,..,l bis s. big Th б ПА 
Now 
5 l, 2, 9, utl, ...,0 
ац > Sui E: a5 Ы, ..., bus Mutts ees a 
l, u, ЕТ, Е р 
ag E 2. (3.9) 


А 
_ [su as bis бы Gutts es 


since 7—5 does not contain observations with b, < s, ... b, < Sw а < Susp 
25 01 X 8. 
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We now put 


s Bi jis оја 


i. OR сулу" 
( %,) КА Вр 


A ЖИЛ 
= > MI ge EDI IEEE 
а >з. 0.28, bi vey DBs Giroa 


A Ja 


and sum (3.8) over all values а, з > Supr = % > Sp divide by (—1)-"3, note that 


is 
the first double sum in (3.8) is equal to 4, " and rearrange to get 
ds 


ә 


ly cand u l-u 
A = Ў (—1)*-8 Ж (—1)°+1 x 
bis «s+, би =0 a=1 1,449 utl, ol bit Б big 


REEL Ne (3.10) 


From (3.8) we now get for аць > Sw u 41 > 81 


4 
bis «++, bus Gus ees] 
1.2, и Ги ав sed] ji enja 
= (1р ДО E P A i at 
bis es bu BHO osl l,o u UTI, est diit ip Aat да 


(3.11) 


[E 
Equations (3.10) and (3.11) show that the solutions А, are completely 


Hy sess бд 


determined by the values 2a, ап and the restrictions (1.7). On the other hand, 


i also completely determine 2...’ Hence the degrees of 


ФУ 
the values of А 
, ia 


ES 
freedom of Q, are equal to the number of independent parameters put equal to 0 by 
the assumption. If з < В, =% «dy вра = йа eo Sn = t, then there are in 19 
just П (1—1) independent parameters associated with the highest order interaction 
{=1 


А ` n 
and of these П $i П (1—1) do not occur in Т—8. Hence Q, has *, (4-1) 
1 1+1 


1 
( П(&—1)—П s; ) degrees of freedom. 
1 
s explicitly for the cases n = апат = 3. In the 


We shall give the solution 
а number > 8. 


following b; will denote a number < Si, % 
n= 2,8 <b, sa < 


А = 81 5—9, 
ta 
1 а 1 is ur 4 р 
4» E d. 8 Ӯ tg—8_ atl % 
vu = = —А. 
ay a 
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‘The case s, = /, can be treated as a complete factorial design with the levels 
of the first factor running from s,+1 to В. 
N= 3, 81 < 1, 8 < to, 8 < ty: 
А = S1+ 8? S3— S13 — g13— 9234 9123, 


1 123 12 13 
=— S —S?—§8+ 923, 
43 E * B Е b, t 
Аша —A, 
а в 
12 p 2 12 
biaa, АСЕ b, V ЖАЫ, р 
DUE mee ag В. +89, 
(ДД biba ba bi 
12 12 1 2 
A =2 —x —x +A. 
ааз аа ai аз 


Tf ti < 5, ta < Sa, tj = 8 then in the above formulae all terms containing the 


index 3 must be deleted. But A АЎ and A^ сап be obtained from the above 
3 ў by аз bib, 


3 


formulae by permutation of indices. 


For № = 8, f = s, we get a complete design with the first index varying 
only from s,+1 to t. } 


Ав can be seen the computation is not excessive and is well adapted to punch 
card machines. 


The procedure given for testing the linear hypothesis (2.1) сап be adapted also 
to test the following more generallinear hypothesis: Let / be a natural set of inter- 
actions and i,, ..., i, an interaction of I such that I — (ii ..., 4) = I, is also a natural 
set. Under the assumption that all interactions outside of Г are 0 in Т and that all 
interactions of (i, ..., ij) are 0 in 7—8, we wish to test the hypothesis that all inter- 
actions outside 7 and the interactions (ù, ..., i„) are 0 in Т. In symbols 


A: а, лит =} и” dein. qu 
І аә, 
нь . 
Ха, „аһ = У) р in 7—8. ... (3.12) 
В 95,2 5.4; 1 : 
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M we may assume iy; as = (05 и). Let Ly Í 
not in Z, I respectively. From the identity 5.9 of Mann 


p 
‚== p nem 
[ессе 


А 


m denda \2 En 
һаа me Сии JE fusa ses fy min АУ 
fd 5 Жа ой i 
i : p e] = 
Е а UE ( A и: 


hy +++ ш 3.5 Pe shanri NS 
RI RR х ( Dn ) 2 
Е a 
_ а а 41,04 а=1 1, ..., 4 а, ә аа 
Е 1, (Л ° 
mx cse ose 1 
9-9. im „[2 Ы а, M 
Qu eir. tay ove a 
“йш ы CRM ue 
a ац 1,504 a= 1,..,@ Gy, vy Ga 


ended over those combinations 4; ...› 4, only which occur in Т—8. 
the second sum of squares we can use formulae (3.10) and (3.11) substi- 
Tt should be noted that Q,—@Q,=9 
since otherwise all combinations 
S. The degrees of freedom for Qa 


in Т—8 we 


for x 
бл, cens An 
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4. TESTS WHEN а priori INFORMATION IS NOT AVAILABLE 


The test of the hypothesis (3.12) can be carried out with only one observation, 
in each cell. The situation leading to the test (2.4) arises in particular in Case TI. 
In this case we wish to test hypotheses H;, including hypotheses suggested by the data 
that the means of certain linear forms whose sum of Squares equals Q,—Q, — Qs. 
are 0, where Q,—Q, is computed from a set S;. We shall i able to determine a cons- 
tant № with the property that 


Р = P(Qs, < 1%) > 1—a, © (4.1) 


where P is the probability that the inequality Qs, < Аб? holds simultaneously for all 
S; for which H; is true. Alternately we may say that the probability that a truo 
hypothesis will be rejected is at most a. It should be noted that we cannot possibly 
make a statement about the exact probability of rejecting a true hypothesis, since this 
probability depends on the number of hypotheses which are true. For instance if 
all Н; are false the probability of rejecting a true hypothesis is 0. It should be recog- 
nized, however, that the same situation also obtains when we test a single hypothesis. 


А similar problem was treated by Scheffé (1953), where the statements made 
concerned the mean values of individual linear forms. We shall need a generalization 
of Seheffé's theorem. 


Theorem 2: Let R, be a space of normally distributed random variables 
of dimension Е. (That is to say there exist in Ry random variables 0,, ..., 0, such that 
loo ө; | 5 0, while for any 4-1 random variables 0,, ..., 0,,, we have | vo; в; | = 0.) 
Suppose се; өз = co, ву с? where сө; ву i8 assumed, to be known and о? is am unknown 
constant. Suppose further that ô? is an estimate of с? such that vô? has the X? distribution 
with v d.f. and is independent of the variables in Ry. We shall call 0 normalized if 
оў = 0°. Let F, (k, v) be the upper ж point of the F distribution with k and v d.f.; 
then, if 0r, <... <T, S k, 


Р = P((O,—E(0,))?+...+ (Or, —E(0,)* < kE alk, v)ó?) = 1-а, — .. (42) 


where P is the probability that the inequality in parenthesis is true for all r; and all choices 
of r; independent normalized random vectors of Ry. 


Proof: Let 5,...,9, be a basis of R, such that On; »; = diag. о?. 
If 0 = a47,4-...--agj, then 


= 9—Е(0) = X. а(,— Ё(;)), 
оў= У айс 


If 0 is normalized then X а? = 1. Hence our theorem will follow from the following 
lemma. 
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Lemma 2: Let yy, -s ур be k real numbers. Consider for every т all sets 
of r normalized orthogonal forms == dg i=l, r, 3 4,4, = б. If 


E у < а then +... +E < a? for all choices т and all choices aS ly df B+... 
+12 < a? for one fixed r and all choices ls, ...,1, then X № < а. 


Proof: Let E y} <a. We can augment the rx k matrix (dy) to ап ortho- 
k k k 

gonal kxk matrix (dj) We then have 5 У =>. Hence X yj < а? implies 
a 1 ik 1 


; 
È B <a. 
1 


r 
Let X 0 < a? for all choices ly ..., 1, (7 fixed). We determine an orthogonal 
1 
(k—r)xk matrix (а) such that = ауу = 0. We augment this matrix to an 
ij dj : 


orthogonal kx Е matrix. Then i an = A А « а. 


‘The theorem now follows immediately since by Lemma 2 
р = P(0,—E(9)*--...3- (0; (0)? < KE (k, 0)0°) 
= P(n- E) +E) € kF alk, ®)@%) = 1—0. 


By p. 142 of Mann (1949), we can find an orthogonal ош { He 
y=1, E (tig —1), & = 0, m and ў, ..., Ùa runs through all choices ў, <<... < ig Out 
1 


= a component of the interaction between the factors 
їй, E 


ty, ..., tg (for definition see Mann (1949, p. 140). 
ЗЕК in the form 


of 1,..375 акы 
Every linear form сап then be 


ун „Л (43) 


a-01,.5" Y фазо ба бз ба 


We modify the definition of interaction component so that 0 is а component, of 
every interaction then 
Let 1 be a linear form of the observations such that Е = 0 
2 Migs 


Theorem 8: 
if m ОЕА 0 for a set I of interactions ty, ..., ia and all possible values dis, · 


i, ss, Ча 


Then 1 is a linear combination of components of : interactions from the set I. ` 


inter- 
Proof: Tt is easy to see that the expectation of a component of the in 


diee ta H 5 
1 i ini . Hence by assumption 
action between i, ...; «18 а linear form in the ГА Он у р 


ир =®® 6, ов) 


dis susta Aus rr 
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CIEL ; aM 

Since E ( 1) р ) are independent linear forms in д ; we can for $41, ..., t€ I 
91, ea ал, ..., ад 

give them any arbitrary values, whence a” atin 0 for (i; ..., ta) ET and all values 


of y, which is just the assertion of Theorem 3. 
Theorem 4: If E(z,)— X aif; & = 1, ..., N and B, are (not necessarily 
i=1 


unique) least square estimates of B; then E (у, guiP;) = B(x). 


Proof : Let 
21 A 
aqu. =al : Је i l 
ty Bs 


The assumption then reads E(x) = Gf. The least square equations read G'z = G'Gf 
and В = Px where the matrix P satisfies the equation G'GP — G'. Now GP is 
symmetric [Mann, (1960), Theorem 1] and hence GPG = P'G'G = 6. Therefore 


E(a—GB) = E(a—G@Px) = E(x)—GPG = E(x)—Gf = 0. 
Theorem 5: Let A be the region l&a,€t,..,1sxa,«t, and let В 


be any subregion of A. Let I be а natural set of interactions and let A” ^ 


Qis ..., аа 


(й...) e be the least square estimates obtained by minimizing 5(= 
В ад, ..., а 
[UA 


I а, a) 


2 
and let Xa,,...,an be the corresponding regression values. Then 


ал, ..., n —Xa, sey On 


is a linear combination of interaction components of T 
Proof: Since I is a natural set 


45). 
E(@a1,....an) = È p in A ... (4.4) 

I Gy. Go 
implies the same representation in B. Hence B(Xay,...,an) = Eltai.. an) if (4.4) 


holds in A. Theorem 5 now follows from Theorem 3. 


Now consider a situation where an independent estimate ô? of о? is available 

` and we compute Qs, = Q,—Q, for various regions S; including some suggested by 
the data, where Q, and Q, are computed under the linear hypothesis (3.12) with S 
replaced by S;. "The computation shows that Qs, is a sum of squares of linear forms 


т" THEE 
in the x b whieh themselves are linear forms in AM id 
а 


Qis es ds sses Ga 


› «wu 16 follows 
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ug 
thus from Theorem 5 that Qs. RED Jj where the l; are components of the interac- 
=1 


tion (1, ..., 4). "The space of these interactions has the dimension 
u 
k = П (6—1). 
1 


Now let H; be the hypothesis E(/j) = 0, j = 1,...‚щ. Then by Theorem 2 

dtm PQs. < ЕРП, 0)°) > 1—a, ... (4.5) 
where P is the probability that the inequality in parenthesis holds for all values ? for 
which Н; is true. 

It should be noted that Qs, for every 8; is а summand of the (1, ..., v) inter- 
action sum of squares of 7. Hence if H; is rejected for any i then the (1, ..., v) inter- 
action sum of squares in 7 will itself be significant on the level æ and (4.5) will serve 
only to locate regions S; from which a significant contribution to the (1, ..., ш) inter- 
action sum of squares in T arises. Тһе situation is quite different іп case I, where 
we test only one region S. In this case the value of Ё in (4.5) is the number of de- 
grees of freedom of Qs and it may very well happen that the hypothesis H of (3.12) 
will be rejected even if the (1, ..., и) interaction sum of squares in T is not significant 
on the level æ. 

One is tempted also to compare Q,—Q, and Qa of the hypothesis (3.12), even 
if the assumption is not known to be true. Tf the statistic F of (2.2) is large this would 
bo taken to mean that the contribution of the region S to the interaction sum of 
squares is large compared to that of 7—S. However, it should be kept in mind pr 
F is then the ratio of two non-central x? and thus does not have the F distribution. 
The value of F in this ease may be very useful аза basis for conjectures and as a lead 
for further research, but it cannot be used for an exact test. 


5. POWER OF THE TEST 
We shall now show that the test (2.4) is at least as powerful as tho test for 
highest order interactions in T if the interactions in 7'—S are 0. 


dimensional Euclidean space А, and two distri- 


i 1 in the n-i 
eme = P,(W) is called the size of W. 


bution-funetions P, Ез in Rẹ, The integral Lars 
The integral f dF, = PW) will be called the power of W. 
A Jen L of regions in Ry к called an additive family if sums, intersec- 

jo; i L are again in L. 
т poc pre. and theorems all regions considered will be ү 
of an additive family L. То avoid cumbersome language ave shall Hal oe 
speak of regions. A most powerful region W will mean a region of um > F те 
that P (W) > PAW’) for all W'eL for which РИ) = PW). Wes 
that our regions satisfy the following conditions. | 
(i) If W is any region of size а and if B <a then W has a subregion of size p. 
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Condition (i) obviously implies 

(1) If W is any region of size о and if a = È f, then W = X W;, where W; has size f;. 
=1 

Lemma 3: Let W be а most powerful region of size a and W, ату subregion 
of W of size В < о. Let К be any region of size В and КПИ empty. Then PAK) 
< PAW). 

Otherwise the region W—W,--K would have size « and higher power than W. 


Lemma 4: Let W be a most powerful region of size à and W* any 
subregion of W of size В <a. Let К be any region of size kB such that К () W is 
emply, then . 
kP(W*) > РК). is (5:2) 


Proof: Choose є >> 0 and arbitrary. Put п = [= |+. Divide К into 
kp 


n regions of size X. In at least one of these regions, say-in D,, we must have P,(D,) < c. 
Hence if 0&0 «€ a there is a region DC К such that P,\(D) = 9, PD) < в. 


Now let f = тё, m integral kf = kmó = ҮЛЕ where д, < 6. Let W* = X iW: 
where 17; bas size 8. Let К = X K,-D, where Р(К;) = 9, Рр) = à, РР у Se 
Let р = min P {W;). By Lemma 3 we have p > P,(K;), whence 
РУК) < [em]p-Fe < kPW*) +e. 

Since є is arbitrary (5.1) must hold. 

"Theorem 6: Let W be a most powerful region of size a. Let Wi ..., Wi 
be t regions P,(W;) = &; and let p,, ..., p, be t non-negative numbers such that 
Uy Уро, = а, Xp -l 
Then : z p,P(W;) < PW). 


_ Proof: Suppose first that WoW, И, И. Let P(W—W N Wy) = $» 
P(W,— ПТ, = fe Suppose р, > f, Then P,(W,) is not decreased if W, 
is replaced by the region W, Г] W+-W* where W* is a subregion of (W — ГИ) of 
size fa МР, <, choose W*(-W,—W(|W, of size f, and replace it by 
W—W;W. 


Without loss of gon we may, therefore, assume that either W; D W 
or W; (a W: 
Now suppose W, DW, И.С W. Let P(W,—W) = A4, РИ) = fo 
If pf, > paf, replace №, by W and subtract i sad W a region W* of 
size x f. Let the two new regions be Wi, W;. Then 
Я і 


PP (Wi) +PP (W3) = АР, (И) Р (W) 
у 200 


MAIN EFFECTS AND INTERACTIONS 
and (Lemma 4) 


РР. Wi)-d-p,P. (Ws) = pP AW) +PP W:)—pP. (W*)4-pP4(W — We) 
> p,PW3)--p;P4(W;). 
If p,//, < pafl; replace W, by W and add a region W* C ИИ, of size р, [рз to Wo. 
We can continue this process until either W; D W for all W; or W C И, for 
all W;, but then the equation Ep,P4(W;) = « shows that all W; are of size о and we 
can replace them by W. This proves Theorem 6. 


Corollary to Theorem 6: Let (Wi) be an infinite sequence of regions and, 
(pj a sequence of non-negative numbers such that Xp,;P(W;) = P(W) and Epi = 1, 
where W is a most powerful region. Then 


EpP(W;) < P{W). 


Proof: Find N so that Š р < є. Then 
+1 


>Р) < È pP W) He < PAW) +6 


Theorem 7: Let S be a space and Q a probability measure defined over В. 
For every zeS let W(z) be a region in Ry: Let further 


J PAWE = а 
and assume that f Po(W(z))dQ exists. If W is a most powerful region of size а, then 
f PAWE < PAW). ЖБ) 

For every Q measurable set S; let Q(S;) be the Q measure of Sj. For every 

в, > 0,6 > 0 we can find a covering (S) of S such that 
J PAWE = XP(W(5))0(80—: 
0 S M S € Ee S; 

and a = | P{W) = ZP (W EDS HM |n] < ex. 
If 7 is positive apply the Corollary to Theorem 6 to a most powerful үзеш of w 
of size оу. If 7 is negative, choose a most powerful region W* of size 0-1-7) contain- 
ing W. Then , 


| Раф < ®Р#)(&д00(5;) < P4W*) = P(W)+P(W*—W) < PAW) +e. 


This proves (5.2). 
We now return to the test (2.4). 
T is our Q, and we have 


The highest interaction sum of squares in 


9, = Q,—Q,- Qs = Хт [n (5.3) 
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where the J, are normalized orthogonal forms whose means are 0 by assumption 
and the hypothesis H states that (тұ) = 0. In the test (2.4) we form 


с" = с. In testing the interaction sum of squares in 7 we form 


us aH, where i = 4, Ef. We shall show that the region F > P, or 


Xi 2 Её is at least as powerful as the region F, > Fy with respect to all 
alternatives E E(m;)? > 0. 


The region F, > Fo is equivalent to 
x > 8—2, s. (5.4) 
where Pixi > bá) = Pod > 8—28) 


provided H(m,;) = 0. We shall use a result of P. L. Hsu (1941) that the critical region 
F 2 F,maximizes the power among all regions W of equal size, whose power depends 
only on A = X(E(m,)?. Consider all regions (5.4) for varying values of үў as regions 
in the space xf, ХЗ and consider all regions obtainable from them by the operations of 
addition, intersection and subtraction. The power function of any region of this 
additive family Г of regions depends only on A (see f.i. Mann (1949) pp. 65, 66). 
Hence if P(F > Fy) = a then the region Р > F, is a most powerful region of size 2 
of the family L. It is not difficult to check that the condition (i) of Theorem 7 is satis- 
fied by L. Now in Theorem 7 let F, be the d.f. of y? if E(m;) = 0 and Р, its distri- 
bution if E E(m;? = A 520. Let W(x) be the region x? >> k,y$— X3 and let Q be 
the cumulative function of yg. Then Theorem 7 shows immediately that the region 
F > Fy is at least as powerful as a region (5.4) ог the equivalent region F, > k, of the 
same size. Actual calculation in special cases, using the tables in Tang (1938) or 
Mann (1949), shows that there is actually an appreciable gain in power. 


"REFERENCES 
Hsu, Р. L. (1941): Analysis of variance from the power-function standpoint. Biometrika, 32, 62-69. 
Mann, Н. В. (1949): Analysis and Design of Experiments, Dover Publications, Inc. 
(1960): The algebra of a linear hypothesis. Ann. Math. Stat., 81, 1-15. 


БснЕРРЕ, HENRY (1953): A method for judging all contrasts in the analysis of variance. Biometrika, 40, 
87-104, ү 

Tana, Р. С. (1938) : The power function of the analysis of variance tests with tables and illustrations 
of their use. Statistical Research Memoirs, 2, 126-149 and tables. 


Paper received : November, 1960. 
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LISTING OF BIB DESIGNS FROM ; = 16 TO 20 


By D. A. SPROTT 
University of Waterloo 


-oon Rete 


65 80 16 13 3 
177 236 16 12 1 15 1 190 7-9 "yg 
45 60 16 12 4 96 114 19 16 3 
33 48 16 11 5 39 57 19150539 76 
20 38 1050/30: 7719 | 

145 232 16 10 1 
25 40 16 10 6 153 323 19 9 1 
113 226 16 8 1 96 304 19 6 1 
29 58 16 8 4 € 20 16 19 5 4 
3 20 95 19 45 du i 
"ac — — 
8 
о 5 зә . 247 19 3 Е 
ECC > ap м 30 20 1 P191 

2 191 191 20771790 4 12 
49 196 16 у i 96 jon «elo 1207 7.54. 
17 68 16 4 3 X 
33 178 16 з тн a a 20 2 i г 
273 273 16):1 9 

© 2 тыш 361 380 20 19 1 2(2,19);1 

137 137 17 11 2* 171 100 2 18 2 
s 69 17 17 4 
5 35 17 17 8 
5 s 16 05. 90 245164; 174 
256 272 17 16 1 E2193] а 188 20 15 2 
120 . 1367 17 а 57 
52 68 11 13 1 36 48510790 ^.^ ГАИ 
18 34 17, 09 8 
120 255 17 8 1 21 28 20 18,. 44 ^* 
5 11 2851090 MU ae E 
35 85 17 7 ЫЙ 
18 51 17 6 в 
35 119 17 5 2 pa 
52 221 17 4 1 Р 
18 102 17 о. 
307 307 18 18 1 P(2,17):1 
154 154 18 185.4 1:9 5 E 
103 103 18 18 S 
52 52 18 18 B 
35 35 18 18 9 
289 306 18 17 1 E(2,17):1 
136 153 18.5 ONES, 


** Derived from the Steiner system 8(5,8,24), Sprott: (1955). 
Series Go, Ts, Ёз, Es, Т2 from Bose (1939). 
4 from Sprott (1956). 


* Non existent. 
Series ол, оз, Вз, Уз from Bose (1942). 


Series C, A, C», By, Вз, Аз from Sprott (1954). Series 11, 1, 
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LISTING OF BIB DESIGNS FROM r = 16 TO 20—(continued) 


b T k by 
70 20 6 5 
324 20 5 1 
68 20 5 5 1 
305 20 4 1 
155 20 4 2 
105 20 4 3 
55 20 4 6 


REFERENCES 


Bose, В. С. (1939): On the construction of balanced incomplete block designs. Ann. Hugen., 10, 353-399. 


(1942): On some new series of balanced incomplete block designs. Bull. Cal. Math. Soc., 34, 
17-31. 


Rao, C. В. (1961): А study of BIB designs with replications 11 to 15. Sankhya, 23, 117-127. 
Srnorr, D. A. (1954): A note on balanced incomplete block designs. Can. J. Math., 6, 341-346. 
(1956): Some series of balanced incomplete block designs. Sankhya, 17, 185-192. 


(1955) : - Balanced incomplete block. designs and tactical configuration. Ann. Math. Stat., 26, 
752-758. $ 


Paper received. : November, 1961. 


CORRIGENDA . 


The Transformation of a Distributi lectio: 
U; bea] " 
Sankhya, Series A, 23, 309-324 S ENT d DULL 


p. 321, line 11 from below For (уа-Е4РУзз) read (¥44+4/723) 

р. 321, line 8 from below For рум read бруса 

р. 322, line 2 from below For 4p*y$g read бр?удз 

p. 323, line 11 For (19 nmi -2519—1) read (T5—1171--25T9 —7) 
p. 323, line 12 For 3127? read 3147? 


On a Problem of Bartholomew in Life Testing : By P. 8. Swamy and 
S. A. D. C. Doss, Sankhyd, Series A, 23, 225-230. | 


(1) Equation (2.4) : Should read as 
e Piles 


j pi Dt рде Q; e 


(2) Equation (2.9) : The last part should read as 152), [= j and not 14j,1=) 
(3) Page 227, line 7 from pelow : Should read бу and not би. 
(4) Page 228, last expression : Should read as 


Rer sei 
д -— ajtj(n—n)T | : 
(б) Page pec first expression : Should read as 


6 = $ арни") )- 
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i IN MEMORIAM 
SIR RONALD AYLMER FISHER 
(1890—1962) 


Sir Ronald Fisher, F.R.S. passed away on 29 July 1962 at Adelaide, Australia 
after an operation. A cable communicating the news was placed before the. meeting 
of the Council of the Indian Statistical Institute held on 31 July 1962 at New Delhi, 
by the President Dr. C. D. Deshmukh. Professor P. ©. Mahalanobis spoke briefly, 
about Sir Ronald Fisher’s great contributions to the advancement of science by 
developing statistics as а new technology which is finding increasing applications 
in all natural and social sciences. He laid the foundations of modern statistical 


theory and devised refined tools for applications, After retirement from the Cam- 
to the Council of Scientific and 


bridge University he was working as & Consultant 
Industrial Research of Australia at ‘Adelaide. He had been in personal touch with , 
the statistical activities in Caleutta from the early twenties and had maintained 
close personal contact with the workers and activities of the Indian Statistical Institute. 
He visited the Institute on eight occasions; the last two visits being from December 
ently as from December 1961 to February 1962. 


1960 to February 1961 and so rec y 
He was the most outstanding statistician of the world and a life-long well-wisher of 


the Institute. 


The following resolution was passed by the Council, all members standing. 


‘The Council places on record the sense of profound sorrow with which it has 


heard of the passing away of Sit Ronald Aylmer Fisher on 29 July 1962 at Adelaide, 
Australia, He ushered in a new epoch in the history of statistical science and was 
the leader of a movement which made statistics а new technology of the present age. 


He established contacts with the statistical workers in Calcutta in the early twenties; 
served as a member of the Examination Committee of the Indian Statistical Institute 
in 1936; served as the Chairman of the Review Committee of the National Sample 
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Survey in 1959-57; and gave continuing support to the Institute at both national 
and international levels. He visited the Institute for the first time in 1937 , and 
came to the Institute on seven subsequent occasions, the last visit being from December 
1961 $0 February 1962. Не was elected an Honorary Fellow of the Indian Statistical 
Institute in 1937, and was awarded the honorary degree of D.Sc., at the first Convo- 
cation of the Institute in February 1962 on which occasion he delivered the Convocation 
Address. Through his several visits to the Institute, personal contaets with its 
workers, scientific contributions to Sankhya@: the Indian Journal of Statistics, visit 
to other scientific centres and advisory work in India, he helped in a most significant 
way in the development of the integrated research, training and project programmes 
of the Indian Statistical Institute and in its emergence as a higher technological 
institution of a new type, and also generally in the advancement of statistics 
in India." ; 

А condolence meeting was held on 30 July 1962 at 12 noon at the Indian 
Statistisal Institute, 203 Barrackpore Trunk Road, Calcutta. The meeting was 
attendol by all students and workers of the Instituto. Shri S. Basu, the Joint 


Director and D.. С. R. Rao, Head of the Research and Training School, spoke at tho 
meeting. 


Dr. С. В. Rao mentioned that with the passing away of Sir Ronald Fisher, 
the world has lost an outstanding scientist. Sir Ronald's contributions were not 
confined to mothodological aspe»ts of statistics which he considerably enlarged and 
refined, but covered appliod fields such as genetics and biometry and, in general, 
influenced scie rtifis thought of the present century. He laid the foundations of 
modern statistical theory by the introduction of small sample distributions, the theory 
of effisient and suffisieat estimation, the like'ihood principle and fiducial inference. 
He iatroluced the eatirely new discipline of statistical design of experiments and 
ап Музїз of varianco by which sciontifio data are collected in an efficient way and 
‘analysed to yield valid inferonzes. Sir Ronald took a special inte-est in the teaching 
and rosearch astivisies of th» Institute and, during each of his several visits, he con- 
ducted sominars on current problems in statistics and left new ideas and new problems 
for th» staff of tho Lastituto to work on. Ралле his last visit to tho Institute, four 
months ago, he was actively working on some examples to illustrate in desail the new 
concepts he had introduced and to throw new light on the theory of statistical 


inference. A great man of science has gon», and no one knows whether the void 
would be filled again. 


: -- The Institute was closed for tho day as a mark of respect to Sir Ronald Fisher. 
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DENSITY IN THE LIGHT OF PROBABILITY THEORY-II* 
By E. M. PAUL 
Indian Statistical Institute 
SUMMARY. Let (Xn) be a sequence of abstract spaces, each Xn consisting of the points 0, 1,2,,.... 


At the point rin Xn, we place probability 1jgr(1—1/gn); qn being the n-th prime number. Let X be the 
product space X; X3 . +. and let P be the product measure. 


Let J be a sequence {jm} of positive integers. Let S be any set of positive integers. M 7 (S) is the 
got of vectors (21, Tar +++) € X such that y 55 ае S for infinitely many n = J. ML (S) is the set of 
vectors (21, 22» ++) & Ж such that y 2n PO є8 for all sufficiently large neJ. We prove that 
p(mz(s)] < 8z (8) < 87 (S) <P [м (8)] for all sets S if and only if ‘se Jus is bounded as m— o. 


8; and 8U stand for lower and upper logarithmic densities, respectively. 
Let f be в finite function defined on the set of positive integers. Suppose for а J satisfying the 


ki : 2 mq. 
condition above, lim. f ( 9 gn ) = g(a) exists with probability 1. Then f has а distribution 
a f 


and this is the same as that of g(x); we employ logarithmic density. 


GENERALIZATION OF THE MAGNIFICATION THEOREM 


We now generalize the magnification theorem in the case of the special example 


discussed in the previous paper: (Paul, 1962). Let Jbeaclassof positiyeintegers. Let Sbe 
ification of 8, MI(S), 


an arbitrary set of positive integers. We define the upper J-magn: 
to be the set of vectors (ty t. ---) such that ( 273^ .. a, Jes for infinitely many values 
of neJ. The lower J-magnification of S, МИЯ), is defined to be the set of vectors 


(25, 2g, ...) such that ( Е у a, Jes for all воће 
Obviously, MA(S) < MXS) < MMS) < MS). This raises the question of obtain- 
ing sharper estimates for lower and upper logarithmic densities. 

We shall prove the following 


iently large values of n in J. 


Let J consist of jj ja «++» I2 ascending order. 


theorem. - 
Theorem: — P[MXS)) < 848) < &"(S) < P[MJ(S)] for ай sets S if and 
e Пов jn) : 
оту if bre remains bounded as n00. 
The proof of the ‘if? part is similar to the proof given by the author 
(Paul, 1962). Let us сай the space X; Xs- X, by the name Y, and 
X X ..X, by the name е each space Xp; let us introduce the 
Git) (+2) ja 
* Part 1 of this paper has boen published in 
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measure described earlier by the author (Paul, 1962) and in each space Y, let us 
introduce the product measure. X may be looked upon as the space Y, У. UOTE t 
Instead of the spaces Ху, X, ... (Paul, 1962; Section 2) we now have У Тш... We 
treat the point (0, 0, ...,0) of Y, as the element 0 of X,. Let (2, 2... 2, 0, 0, 
-)eIC X. We associate with it the number ООС ИСО CI, we define 
9" (с) to be the upper logarithmie density of the corresponding set of positive integers. 
The space Y, У,... and ô satisfy Postulates (A) to (F) of Section 2 and condition G 
of Section 3 of the previous paper (Paul, 1962). The proof that condition H also 
holds is similar to the proof given in Section 6 of the previous paper (Paul, 1962) but 
requires a little explanation. Let B be a right-complete set in I C Y, Y,.... Let 


(ty; «++ %m; 0, 0,...) be a basic vector of В and let 252» 0. Let jy <M S jo. 
Let 


f. (8) = CUDA —1/a5)...(0.— 1/95...) 
| es a) 


We are interested in proving that X f, (s), over all basic veetors, is continuous on 
n 


[1,2]. Since m may be < Лии» Our previous argument does not go through directly. 
So we introduce 


Фив) = (1d)... —1g) 
2%, 


e) 
Th fale) шы tee АА 1084, 
y $e? ( Ж ( foe! Bg 


Jai 


by Merten's theorem, > x 0, by hypothesis on J. 


We now apply the argument given in the previous paper (Paul, 1962) and 
prove that У ф, (s) is continuous on [1,2]. Continuity of X fals) on [1, 2] follows 


immediately, and the ‘if? part is proved. 


Before proving the ‘only if’ part, we give an example of a J and a right-complete 
set in IC Y, Y,... such that condition H (Paul, 1962) is violated. Of course, in 


$ log jn 3 ; 
this case ( ae ) will be unbounded. Let us take a fixed number < 1, say =. 


Also, let us take the sequence УО 3T 


1@ Ir 12: т, 


Let j, — 2, so that the first block of primes is 2, 3. Let us declare 
(0, 1, 0, 0,...) and (2, 1, 0, 0, +) as basic vectors. The cylinder sets whose bases 
are the points (0,1) and (2, 1) carry probability 


—(—4)(1-3) 0—30—1) 5 
; act 223 = prag: 
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We now determine j, The set of numbers of the form 271 3% has density 
zero. Thus the complementary set C, has density 1. We now take numbers 5, 7, 
10, 11, 13, 14, 15, 17, ..., M, of C, so that 


1 
У — of these number 
азе з = Lo EM 
М, 13 EID 
п=1 ? 
Let фт) = (1.—1/a) ... (11/45) 
We take a js so large that у 
5 : LES UA Е 1 3 
= — М + ғ — 
Ba = зв + Hie) REPRE us i 
We now introduce basic vectors so that 5, 7, 10, ..., M, all become members of our 
right-complete set. In order to admit 5, we declare (0, 0, 1, 0, 0, ...) as a basic vector. 
In order to admit 7, we declare (0, 0, 0, 1, 0, 0, ...) as a basic vector. For 10, we de- 
clare (1, 0, 1, 0, 0, ...), and proceed like this until M, gains entry into our right-com- 
plete set. Of course, we make j, so large that qj, > Ms. 


"ha 


Let C, be the complement of the set of numbers of the form @ 4, ° We 
choose an M, so large that 


5 qois d 1 3 
We choose a j so large that fs = f$ ( log th Aa + м.) e 


We then admit basic vectors so that all n inE { més, 401) << И» } gain entry 


like this, we construct & right-complete 


into our right-complete set. Proceeding 
logarithmic density is = 1 but whose 


(with respect to J) set whose upper 


3 
magnification has measure <T: 


Now, let J be any given sequence such that is a is unbounded. The 
counter example given above can be modi ed so as to prove the ‘only if’ part, asfollows. 
Suppose фу, +++» 918 а block of consecutive primes. Let M be such that фу <M «qi 
Consider the set of numbers all of whose prime factors are exclusively from among 


91, d «+> qp let Сь be the complement of this set. Consider the quantity 


log M—e’ log 4% 
log M ! 
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approximately (v denotes Euler’s constant), 


log q; 
=1—е”. 08 Ф. 
$ log M 


We now use the following lemma : 
Lemma: Let a, аз, ... be increasing sequence of positive integers such that 


ПЕ н; 14s unbounded. Take ату є > 0, 8 > 0. We can determine an n and a positive, 
integer М such that 
- log a, log M 
M ^ à. 
a < M <а, and ` ic and lean. < 


Rigorizing the nonrigorous part above is trivial. 

Corollary : Let /(%) be a finite real-valued function defined on the set of positive 
integers. "Suppose there is a sequence J of positive integers j, such that hn is 

2. 
bounded and ТРЕЕ За) converges with probability 1 to a random variable g(x), as 
т оо. Then f has a distribution and this is the same as the distribution of g(x); we 
use logarithmic density. 
REFERENCE 

Рлот, Е, М, (1962) : Density in the light of probability theory. Sankhya, Series А, 24, 103-114. 
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LIMIT THEOREMS FOR SUMS OF INDEPENDENT RANDOM 
VARIABLES WITH VALUES IN A HILBERT SPACE 


By 8. В. S. VARADHAN 
Indian Statistical Institute 


SUMMARY. In this article distributions on а Real Separable Hilbert Space are considered. 
Limit distributions are derived for sums of infinitesimal random. variables. 


A representation similar to the Levy-Khintchine representation is derived for infinitely divisible 
distributions. Necessary and sufficient conditions for compactness are obtained in terms of the 
quantities occurring in the representation. 


1, INTRODUCTION 


The celebrated Levy-Khintchine representation for the characteristic func- 

tion P(t) of an infinitely divisible distribution on the real line is given by 

y ә : 2 

o(t) = exp СЕ 2 +] ( e IT ) m а). aD 
where у and с are real constants, © > 0 and G(x) is a bounded non-decreasing function 
of x which is continuous at the origin. wc and @ are uniquely determined by ¢(t). 
Conversely, any function of the type (1.1) is the characteristic function of an infinitely 
divisible distribution. Khintchine and Bawly went further and proved that the limit 
distributions of sums of uniformly infinitesimal random variables are infinitely divi- 
sible and that they can be obtained as limits of certain accompanying infinitely divisible 
distributions. For all historical and other details concerning these we refer to Gnedenko 
and Kolmogorov (1954). Attempts have been made by several authors to extend these 
results to more general situations. We mention, in particular, the works of Bochner 
(1955) and (1958), Hunt (1956), Kloss (1961), Levy (1939), Takano (1955) and Vorobev 
(1954), Takano has generalised both the representation and the limit theorems to the 
case of a finite dimensional vector space. Levy has considered the circle group 


(the multiplicative group of complex numbers of modulus unity). The axiomatic 
development of the concept of characteristic functions by Bochner (1958) throws some 
blem. Hunt has obtained the representation for one 


light on the nature of the pro | 
parametrio semigroups of distributions on а Lie group. Vorobev has considered 


finite groups and Kloss & wide class of compact groups. 

The case of a general locally compact abelian group was considered very 
recently by К. В. Parthasarathy, R. Rangarao and the present author (1962а). The 
results obtained are mentioned briefly in the next section and we will have occasion 
to use them in the later sections. 

The case of à non-locally compact group offers considerable difficulty. How- 
ase of a separable Hilbert space can be tackled using some com- 
pactness criteria of Prohorov (1956). In this paper we will consider the case of а 
Hilbert space, and the analysis will throw some light on the precise nature of the 
difficulties in extending these results to non-locally compact groups. 


ever, the particular ei 
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2. THE LOCALLY COMPACT CASE 


In the present section we will briefly mention the results obtained for the 
locally compact case. We will be using some of these results in the latter sections. 


Let X be a locally compact abelian group which is separable metric and let 
У be its character group. For veX and ycY we write (x, y) to denote the value of the 
character y at the point x. 

Let ИЖ denote the convolution semi-group of probability measures on X, 
endowed with the weak topology. A sequence д, in . is shift compact whenever 
the sequence formed by some translates of these distributions is compact. Throughout 
the article * denotes the convolution operation. When 2: is an element of the group 
and A a distribution Axx denotes the convolution of A with the distribution degene- 
rate аб æ. 2” denotes the n-th convolution of A. [à |? denotes A¥\ where А is the 
distribution defined Бу \(A) = A(—A). —А consists of the set of inverses of A. 
The following theorem was proved by Parthasarathy её al (1962). 

Theorem 2.1: Let д, = cp *Bn for each т. If и» is compact then o, and Bn 
are shift compact. Е 

Definition 21: А distribution / is said to be infinitely divisible if for each 
n there are elements x, in X and A, € Jf such that 

p = Ажа. 

Remark 2.1: This modification of the classical definition is intended to 
avoid the consequence of the presence of elements that are not divisible. In a divisible 
group this definition is equivalent to the classical definition as can be easily seen. 


Theorem 2.2: The totality of infinitely divisible distributions is a closed sub- 


- semigroup of M. 


Definition 2.2: Tf F is any totally finite measure on X the distribution 
e(F) associated with it is defined as follows. 


(F) = aa] LEFAT A... ]. 


For де. we denote by д(у) its characteristic function. We then have 
«Ё)(у) = LU (=, y) ПАР]. 

Theorem 2.3: If y is infinitely divisible and if its characteristic function 
vanishes at some point then и has an idempotent factor. 

Theorem 2.4: Let Un = (Fp). Then the necessary and sufficient conditions 
that Y 
(a) д, is shift compact, 
(b) if д is any limit of shifts of p, then p has по idempolent factors, are 
(i) for each neighbourhood N of identity e the family F, restricted to X—N 
ts weakly conditionally compact; 

(i) for each ye Y 

sup f [1—Real (x, y)ZF, < co. 
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Definition 2.3: А double sequence ау: = l2, Kn; т=1,?,... of 
distributions is said to be uniformly infinitesimal if 


It вир вир |o(y)]-1| — 0 
ик. vk Je; (y) 11 


for every compact subset K C: Y. 3j 
Lemma 2.1: There exists a real valued function g(x,y) on X X Y which is 
continuous in both the variables x and у and such that 
G) инь) = gos Y+ Ya) for each weX and ys, 0367, 
(ii) for any compact subset КС y. ў 
sup sup |g, )1 <+, 
zeX yeK 
(iii) for each compact subset KCY there exists a neighbourhood Nx of the 
identity in X such that : 
(@, y) = gn 
holds for жє Мк and yeK, х 
(iv) g(x, y) = 0 whenever v= € for any y. › 
Theorem 2.5: Let abe а uniformly infinitesimal sequence and let 
ken 
Pn = 9j. j 
Suppose 1, is shift compact and that no limit of shifts of jj, has an idempotent factor. 
Let Anj = ety #9nj] where gay is that element of the group X defined by the equality 


(дар 9) = exp {~i J dt Ydan} 


kn al 
Tf Ay = П BrytJn where In = — 2 Inj 
jel 
then lim sup [An Фи) = 0 
п—у00 ve. 


for every compact seb K of у; | \ 
Corollary 2.1: Limit distribution of sums infinitesimal independent random 
variables is infinitely divisible. UAE : 
Definition 2.4: А distribution x is said to be Gaussian if it has the following 
properties, ' 
(i) д is infinitely divisible, z | 
(ii) u = e(F) жо, where ais infinitely divisible implies that F vanishes outside 
the identity. 


Therem 2.6: А distribution Ш is Gaussian if and only if uly) is of the form 


ply) = (2; Y) exp [dy 
where « is a fixed element of X and Hy) а continuous non-negative 
the equation 


function of y satisfying 


возни) = 20992) + He) 


for every pair yy Ya ™ Y; 
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Theorem 2.7 : A distribution p on, X is infinitely divisible without an idem- 
potent factor if and only if HY) is of the form 
MY) = @ y) exp [Ле y)—1—ig(x, y)]aM—y)] 


(1) zo is a fixed element of X, 
(ü) (y) is а function as in Theorem 2.6, 
(iii) М is a o-finite measure giving finite mass outside every neighbourhood of 
the identity, — 
(iv) f[L—Real (x, ум < +00 for each ye Y. 
Remark 2.2: This representation is not in general unique. Examples of 
non-uniqueness and conditions for uniqueness are discussed by Parthasarathy её al 


(1962a) . It turns out that if the character group is connected then the representation 
is unique. 


where 


Corollary 2.2 : Every one parametric weakly continuous semigroup и, has a 
unique representation 


HAY) = (шь y) exp [2 f (а, y)— 1—ig(e, уаМ —t9y)) 
where x, is a continuous semigroup in X and М and ¢ are as in Theorem 2.7. 

Remark 2.3: Of the results mentioned in this section Theorems 2.1, 2.2 
and the "necessity part of Theorem 2.3 are valid in any complete separable metric 
group. We will also need the following corollary and theorem deducible from 
Theorem 2.2. 

Corollary 2.3: A, is shift compact if and only if |A,]* is compact. 

Theorem 2.8: Let A, be a sequence such that А, із а factor of Àn;ı for all. n. 
Then if А„ is shift compact there are translates А; of А, which converge. 


Remark 2.4: We can assume instead that A, is a factor of A, for each n 
and then also the theorem will be valid. 


3. PRELIMINARIES 


X is a real separable Hilbert Space, (x, y) denotes the inner product and а] 
the norm. With vector addition as group operation X becomes a complete separable 
metric group. We will denote by the semigroup of all distributions. 


For every ш e M its characteristic function is defined on X by the formula 


щу) = [ е аще). 
We will, in this Section, mention some results obtained by Prohorov (1956) concern- 
ing compactness criteria for distributions on X. 


Definition 3.1: А positive semi-definite Hermitian operator A is called an 
S-operator if it has finite trace. The class of sets of the type [2 :(Sz, x) < t] where 
$ runs over S-operators and # over positive numbers forms a neighbourhood system 
at the origin for a certain topology which is called the S-topology. A netz, converges 


to zero in S-topology if and only if (Sz,, ta) converges to zero for every S-operator S. 
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We have the following theorem concerning characteristic functions and 
S-topology which was obtained by Sazanov (1958). 

Theorem 3.1 s Is order that a function uly) may be the characteristic function 
of a distribution on X it is necessary and sufficient that шо) = 1, j(y) be positive definite 
and continuous at у = 0 т the S+topology. (Here and elsewhere in the article 0 will 
denote the null element of X or the identity of the group and will be called the origin). 

We also have the following theorem of Prohorov (1956). 

Theorem 3.2: In order that a positive definite function у) with ЩО) = 1 
be the characteristic function of a distribution on X it is necessary and sufficient that for 
every € > 0 there exists an S-operator S, such that 

1— Ruy) < (8:0, y)-re 
where R denotes the real parts. 

Definition 3.2: Let j bea distribution on X for which f (x, y*dj < 00 for 
each y. Then the covariance operator S of p is that Hermitian operator for which 
erator 8 will be positive semi-definite and will be an S- 

| nl? dle) < оо. 
Definition 3.3: A sequence Sn of S-operators will be called compact if and 
only if the following two conditions are satisfied 
(1) sup Trace-(S,) < © 
n 


This op: operator if and only if 


% sup b (8,65, 6) 0 
Noe п j-N 


(ii) 


for some orthonormal sequence ер a» -- 6 + 
When 8 is the covariance operator of à 
у 12402) < © 


Ў (8% e) = [rS edu) 
j= 


distribution д on X for which 


we have the relation 


7% (2) = ej? 


where 


and ее»... any orthonormal basis. 
The following theorems concerning conditions 

д» of distributions on X. were obtained by Prohorov (1956). 
of distributions on X be weakly 


Theorem 3.3: In order that a sequence [In : 
conditionally compact it is necessary and sufficient that for every € > 0, there exists a 


of S-operators such that —— 
1—B ply) < (8 9) + 
characteristic function. of Hn: 
217 


for compactness of a sequence 


compact sequence S. 


for ай n and у. Here pn (Y) is the 
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Theorem 3.4: Let Hn be а sequence of distributions for which the covariance 
operators S, exist and are S-operators. Let further S, be compact. Then My їз weakly 
conditionally compact. 


Theorem 3.5: Im order that a sequence jt, of distributions on X may be weakly 

conditionally compact it is necessary that for ату € > 0 
№ sup y,[r3(X) > e] = 0. 
Nyo n 

In defining the covariance operator in Definition 3.2. we assumed that 4 із 

a distribution. Actually, if М is any c-finite measure for which 
fle]? dM < co 
then S is well defined as an S-operator by the relation 
(Sy, y) = f (e, y)? dM(). 

We will need also the following. 

Theorem 3.6: (Вапрагао, 1960). Letu,— и. Let {fa} be a family of conti- 
nuous functions equicontinuous and uniformly bounded. Then 


att, вир | Лада f аи | = 0. 


4. AN ESTIMATE OF THE VARIANCE 


Let X,, Xa, ..., X,, ben symmetric independent random variables in the Hil- 
bert space X. We will give in this section an estimate for the variance E|X,4-... X,|| 
when each X; is bounded uniformly in norm by a constant C independent of j. 


To this end we introduce the concentration function Q,(t) following Levy 
[(1954), page 138]. 1 


Definition 4.1: The concentration function Q,(t) of a distribution ш in the 
Hilbert space X is defined for 0 <t<oas 
Q,(t) = sup (8,2) 
zeX 
where 5, denotes the sphere [= : |61 < t] and S,-+2 its translate by the element x of X. 
We now list a few elementary properties of these functions. 
Theorem 4.1: (1) Q(t) isa nondecreasing function of tand lim Q,(t) = 1. 
А t—00 
(2) If жиз = p then, for every t, Qu (t) < min [Qu, (t), Qu, (0]. (3) If и, is shift 
compact then, 
lim inf Qun (t) = 1. 
to n 
Proof: (1) and (2) are obvious and (3) is à consequence of tightness. 


Theorem 4.2: Let Xy X3... X, ben mutually independent symmetric random 
variables. Let S, = X 115.4. Further, let Q(t) denote the concentration function 
of the вит S, = X,+...X,. If Т is defined as 

Mos EU 8, 
: х с: * 15; || 
then one has for any t > 0 РТ > 4i} < 21—000). 
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; Broh : This remarkable result of Levy, which is a refinement of Kolmogorov’s 
inequality in terms of the variance, although proved by him for the real line, offers 
no difficulty for generalisation. Let the events E; be defined as follows : 
Ер = [||8у| S 46 ..., 15,11 46 [Sal] > 44] ° 
(T > 4} = Е)Е,...\)Е„ ЕДЕ, = $. 

By Pr( ) we will denote the probability of the event within the brackets given 
that the event Я; Ваз occurred. We then have 

Pr{l|Sqll < 28 < Pr(lS, —8,] > 20 = P(lS,— 8, > 28. © (4.1) 


This is because F, and [18,1 < 2t imply that ||$,—,|| > 2 and S,—58, is distributed 
independently of Æ, Let us further suppose that Q(t) > 4. We now consider the 
distribution д of S,—S, whose concentration function is denoted by Q,,(f). Since 
I'm is a factor of р we have Qmlt) > 3. This implies the existence of a point v in the 
space X such that ' 

Hen (8.49) > Qt) —6 > $- 5 =. (42) 
Since д» is a symmetric distribution 

Je (®,—®) = Шт (—8 2) = д (Sekt) > $ 
Therefore, SeN Str = $. 
In other words there exists a point y such that 

НУ © Ь № < t - 
These two imply that |2 <t and hence 
2+8, Sa. ... (4.3) 

From (4.2) and (4.3) follows (80) > 000) —е. 
Since e is arbitrary we have Ит (Spe) > Qr) 
which is the same as P{||S,— 420€ 1—0.) < 1—Q(0. = (45) 
We have further, using (4.1) and (4.5) | 


P{T>4t, |S,]] «20 = È РАЗЫ iiM (= PIE Q0 —Q()) =P > 401 Q(0), - 


(44) 


(4.6) 
we also have РТ < 4t, ||Sull «20 < P(T < 4 = 1—P{T > At). (4.7) 
1—P(T > 4000. e. (4.8) 


Adding (4.6) and (4.7) we get Р{|8»] << 
From (4.4) putting r — 0, we obtain 
Р{|8„| < 28 > 90, e (4.9) 
(4.8) апа (4.9) imply, since Q(t) > $ ) 
P(T > 4) < xd «mo = 2. (4.10) 


However, if Q(t) < $ the theorem is trivially true. Thus the proof of the theorem 


is complete. 
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We have the following theorem which gives an estimate of the variance. This 
is well known for the realline. We reproduce the proof for the real line, which can 
be found in Halmos [(1950) page 198], replacing however || by |x|]. 

Theorem 43: Let X, X, ..., X, be n independent random variables in а 
Hilbert space, uniformly bounded by a constant c in norm. Let each X; have zero 
expectation. In addition let { 

P( sup [&|<4}>є>0 
1<7<» 


where 8; = X+... X; 
Then E 18° < Eie Е 


Proof: Let the sets E, be defined as follows 


B= sup |8]<4. 
I&j«&hk 


Then F, > Е,... D E, and Р(Е,) > є> 0. By P we mean the product measure of 
Xis XQ... Xp. Let us define 
Fi = E, ,—E, 
and о = Г |8? dP. 
Ер 


We will take Е, as the whole Space and a as zero. We then have 
A= [|8,2 dP—[ |8, 4? dP ` 
Ek Eja 


= Sill? dP—f |8|#4Р—[ Spal? dP 
M ШЕ у Ш zl, 18111 
= ISl dP—2 f (Sro XgdP-- f Хар 
Ерл Dra Ерл 
=S 18? dP— Г |8, |1 aP 
Fk CER 
= S аР ISP? ар 
Ерл Ер 


> Р(Е,_)Е|Х,|#—(с--4)#Р(Е,). we (4.11) 
We notice here that f (Si, Ху) dP = 0 since X; is independent of S, ,, has zero 
Ea 


expectation and is independent of Ej, as well Moreover, F, С E,, hence 
18и < 18-15-10 < (c+4) over Fp. Since Р(Е,) < Р(Е,) for any k we have 


0—01 > P(B) E|X,|—(c--d)»P(F,) for k= д s. (4.12) 
Since 7^, Po, ..., Р, are all disjoint adding (4.12) for k = 1, 2, ..., т, we have 
% > P(E)E |S,? —(c--d). ... (4.13) 
However, since IS,| < don En а, = f [15.2 dP < а?, ... (4.14) 
2 En 
(4.13) and (4.14) give EJS, < ее «eura 


This completes the proof. 
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Theorem 4.4: Let Xs, Xs, ..., Xj be symmetric independent random variables 
in the Hilbert space such that || X;|| < c for i = L, 2, ..., n. Le Q(t) denote" the 
concentration function of 8, = Ха... Хи, Then 
1682--(c-- 40) 
В 
= 
for ату t such that Q(t) > +: 
Proof: This follows at once from Theorems 4.2 and 4.3 and the fact that 
a bounded symmetric random variable has zero expectation. : 


5. INFINITELY DIVISIBLE DISTRIBUTIONS 


In this section we will obtain a representation. for infinitely divisible distri- 
butions. The definition of an infinitely divisible distribution is the same as Definition 
2.1, but since the Hilbert space as а group is divisible it is equivalent to the classical 
definition which requires that д be written as A5 for each n. 


Аз we have already remarked at the end of Section 2 that some of the results 
Hilbert space we will now state them. 


mentioned in that Section are valid for the 
We will keep in mind also that the Hilbert space has no nontrivial compact subgroups 
and hence ћете ‘ате no idempotent distributions. і 

Theorem 5.1: The infinitely divisible distributions 
semigroup among all distributions. 

Theorem 5.2: If ш is an infinitely divisi 
teristic function then Щу) is nonvanishing. 

For every finite measure F the infinitely divisible distribution e(F) is associated 
in the same way as in Definition 2.2. We then have the following theorem. 

Theorem 5.8: Let fn = Фа). In order that p, may be shift compact it is 
necessary that E 

(i) for each neighbourhood М 
conditionally compact, 

(i) sup Г [1—cos (2,9), < © for each y. | 

n ` 

) ly) uniformly over every bounded 


› 
form a closed sub- 


ble distribution and ply) its charac- 


of the identity Fn restricted to М’ is weakly 


Theorem 5.4 : Let Ji, => P^ Then uy 
sphere. 

Proof: Since the set of functions (2, y) as y varies over а bounded sphere 
forms a equicontinuous family of functions in x the theorem follows. 


Theorem 5.5: Let pn be shift compact and fin (0) ШУ) uniformly over 
bounded spheres. Then Hn => I. 
Proof: Since Jm is shift compact let v; in X be chosen such that инж, 18 
compact. We will now show that z, in X is compact. № a, is not compact then we 
can produce a subsequence from z, which has no further convergent subsequence. 
We will denote the subsequence by =, itself. Since дих» is weakly. compact it has 
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a subsequence ръж; converging weakly. Thus Ln (уе, ” as well as Ит; (y) con- 

verge uniformly over bounded spheres. Since from the compactness of 1,12 one 

can conclude the existence of a sphere S such that inf ]us(u)| > € > 0 for all y 
n 

in 5, it follows that ей, У) converges uniformly over S and hence æn; converges in 

norm. This proves the theorem. 


Before obtaining the representation we will show that if e(F,,) is shift compact 
then sup |ljaildF,, < оо. 
n jell <1 


To this end we consider the following lemma. 


Lemma 5.1: Let f(y) be a non-negative function on X such that f(2y) < 4f(y) 
for all values of у. If f(y) < € whenever (Sy, y) < 9 where S is some S-operator then 


Ду) < (Syy, y)+e for all y 


where S, = 460-18. 
Proof: Defining S, = c0-18, 
we see that Sy) < є whenever (Sy, у) < є. ... (5.1) 


Further, if (Sy, y) < 4"c where n is a positive integer, then denoting by y, the 
element 2-" y, we have 
(800, Yn) = 47" (So у, y) & € 
consequently f(y,) < e. But since f(2y) < Af(y), 
we have FU) < 4" f(y,) < 4'e. 
` So from (5.1) we have 
Ду) < 4"e whenever (Sy, y) < 4". "€ (5.2) 
Let у be any element of X and let (Soy, y) = t. 
Case (i): Lett>e. Ifnisa non-negative integer such that 


4" e < t« 4+6, (5.3) 
We have, since (Soy, y) = t < 4”+1є, using (5.2) and (5.3) 
Лу) < 4"! e € 4t = 4(Sqy, y). sae) (5.4) 
Case (ii): Lett < є. Then from (4.1) we have 
ЈУ) < e. me (5.5) 
(5.4) and (5.5) give at once Ду) < max [e, (489, y)] < e--(Syy, y). 


We shall need while proving the next theorem the following inequality. If a, ... am 
are any т real numbers such that |а;| < lforl&j«m, then 


m 
1—a85... ат RE (1—a;). ... (5.6) 
=1 


This inequality is proved by induction if all the a's are positive. If at least one of 
them say a, is negative 


m 
1—4, ... a4 € l-- |a... dy | < 1+ |a,| = 1—a, xd (1—а,). 
We will now prove the following theorem. 
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Theorem 5.6: Let Е, be a sequence of finite measures such that e(F,) із shift 
compact. Then $ 
sup f ӘРЕ, <o. 
n lisli 
Proof: We assume without any loss of generality that each Р, vanishes 
outside the unit sphere. Otherwise we can consider the restriction of F, to the unit 
sphere instead of F}. Let M, =F,4+F,. Then e(M,) = je(F,) |2 = An із compact. 
We will show that f |æ]? 42M, is uniformly bounded. To this end we assume that 
the total mass of М, is am integer for every 7. Tf this were not so we can write 
М, = MPMP where М is symmetric with an integral total mass and И? has 
total mass less than unity. Consequently 


f мам? < 1. 
[ES 


Since our aim is to prove that вир f. 1420, < о 
n Ш <! 


it suffices to show that sup Г le] aM e? < о. 
n М <! 


Now since the total mass of М, is an integer say ie, we will write 
М„= Fate Fut, 
where F,; for j = 1, 2, «5 „= 1,2, ... isa symmetric probability measure in the 
unit sphere. Let us now denote by jj, the convolution 
ра = Fm Еш. *Fnky: 


Since each F, is symmetric and has zero expectation 
kn 
J Mele do, = e f ele; = ПРМ,» 


Hence it suffices to show that sup а diu, < ©- 


If Q,(t) denotes the concentration function of и» from Theorem 4.4 we have 


16/24-(414-1)? 
ПӘ? dpa < зуу 
whenever Q,(t) > $. Therefore, it is enough to prove that 
inf Qt) 2 à for some t 


which will follow from Theorem 4.1 if we prove that ji, is weakly conditionally compact. 


Since each F,,(y) is a real characteristic function and 


kn 
x (5.7) 
pay) = П Fat) 
— И, У] = 3 j 1— eost, УЛ») 
jal 


En 
it follows from (5.6) that 1—9) < I 
(5.8) 


= f coste, И MS) = Лу) вау. 
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We also have A,(y) = exp [—f.(y)] and hence for any given є there exists a à depend- 
ing only on e such that 


Л) <е if 1—A(y) < à. ... (5.9) 
Since A, is compact we have from Theorem 3.5 for any given д > 0 a compact sequence 
S, of S-operators depending on à only such that ` 


1749) < (Sy, 9) +2. ... (5.10) 


From (5.9) and (5.10) it follows that whenever (5,0, Y) < 6/2, f. (y) < € and hence from 
Lemma 5.1 we have 

Faly) < (Suy, y) в © (51) 
where „ = 860-18,. 
Since S, is compact во is S; and Theorem 3.3 and (5.11) imply that д is weakly condi- 
tionally compact. The proof of the theorem is now complete. 


We will denote by K(x, y) the following function 
= gn qo Y y) 
atado ЕЛ ие ball 
Theorem 5.7 : Let u, for each n be of the form e(F p) where F',, is a finite measure. 
Let i, v, = и, for some suitably chosen elements x, in X. We further assume that Е, 
is increasing. Then F, increases to a measure F which may be o-finite but gives finite 
mass outside every neighbourhood of the origin and for which 


f [ж ?Ф# < о. 
lel 1 


In addition Щу) = exp [ilto y)+ J K(x, y)dM(x)] 
where жу is a fixed element of X. 
Proof: Let A,(y) be defined as 
Any) = ехр f Kle, yd, ]. 
Then A,(y) is the characteristic function of A, which is the shift of и, by the element 


ZnS ip T Um 
^7 pog 4. 
We will now show that Л, (у) converges uniformly in y over bounded spheres. For 
this purpose we write 


PKGyuF,— f К, уаР,- Г Ка, y)dF,. ese (5.12) 
liia li 


Let Ё be the limit of PF, From Theorems 5.3 and 5.6 it follows that Ё is finite outside 
every neighbourhood of the origin and 


Ї аР < о. 
lell1 > 
ES Т [ll Ily 
ince > |K(z, y)| < +] Fii < 2+ lyl| 
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it follows that 


sup| f _ K(x, y)dF,— K(x, y a | 
а л ыыы 


for every bounded sphere S. On the other hand if || < 1 
[K(z, y)| < | &*? —1—iz, С 
DIt тр 
< + (æ, y [s v) Ll < МУР: 


Since [а]? dF is finite it follows that 
lei 1 


sup| f- К(т,у)йЁ„— K(z,y)F|— 0 as n 00 ... (5.14 
veS Joll A В E 


for every bounded sphere S. (5.12), (5.13) and (5.14) imply that 
pe [А (0) —A(y) 1-9 0 as n oo 
ye 


for every bounded sphere S where A(y) = exp [ f K(x, у)аЕ]. 


Since A, is shift compact from Theorem 5.5 it follows that A, == A and A has to be a 
shift of д. Hence the theorem follows. 
Theorem 5.8: Let uly) be a function of the form 
wy) = ехр[ J K(x, y FG)] 
where F is a o-finite measure giving finite mass outside every neighbourhood of the 
identity and for which 


Г [к dF < œ. 
| <1 


Then py) is the characteristic function of an infinitely divisible distribution. 
Proof: Let N, denote the sphere of radius = around the origin and F, the 
restriction of F to №. Then F, increases to F. Let 
ng) = ехр[ f Kle, У)4Е,]. 

From the proof of Theorem 5.1 it follows that 

sup [м (У)-— My) 190 ав n 

yt 
for every bounded sphere V. In view of Theorems 5.1 and 5.5 it is enough to show 
that д„ is shift compact. We will now show that A, = |#n]? is compact. 

An = |n]? = |&Е„)|* = 0) where M, = Fyt Ри. 


increases to M where M = ЕР. With- 


Since F, increases to F № follows that M, s 
that F and hence M yanishes outside the 


out any loss of generality we can assume 
sphere |2|| < 1. We further have х 
1—A,(y) = 1—ехр [ J [eos (æ, )—11ФИ„] = 1—ехр [=f [1—eos (x, y) Мы] 


< J [1—cos (8,2140, < J [1005 (є, y)] dM. 
<; Г (y dM = М8 0). 
lel & 1 A ‚ 
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Since f ам =2 f JjeBdP-o 
lel < 1 ligil ~ 
it follows that S is an S-operator. Since S is a fixed S-operator independent of n 
it follows from Theorem 3.5 that A, is compact. Consequently д, is shift compact 
and the theorem is proved. 
Gaussian distributions are defined in the Hilbert Space in exactly the same 
manner as in Definition 2.4. We shall now prove 
Theorem 5.9: А distribution и on X is Gaussian if and only if p(y) is of the 
form 
(У) = exp {i(%, y) - (Sy, y) 
where X, is а fixed element and S an S-operator. 
Proof: Let us take a countable dense subset Vi Yz +++» Ул... Ш X and consider 
the map т from X to 7°, the countable produet of the circle groups, defined as 
T(x) = [ей 91), ... eim Yn) , ...]. 
Let H be the image of X under rin Z”, Then7isa both ways measurable isomorphism 
of the two groups X and H. If is Gaussian on X then 4T} is Gaussian in Н and hence 
in 2°. From Theorem 2.5 we have 
ur7(0) = 6(2) exp [—9(6)] ... (6.15) 
where 0 is a character on Z”, Z is a fixed element of Z* and фа function with 
properties specified in Theorem 2.5. A(Z) denotes the value of the character @ at 
the point Z. Further any character 0 on Ze is of the form 
0(2) = Zm Zn ... (5.16) 


where m, n, are integer and Z}, ... 2, the first k coordinates of Z in Z*. 


"Therefore, ATO) = u(n4 yy... туь) 
where (9) is the characteristic function on X and Ny, ... Ng are related to 0 by the rela- 
tion (5.16). Hence 
lpr76)| = || (m эу +... ли) = е—90). 
Since for any 0 and 6' 
9(04-0")--9(0—6") = 219(0)4-(0")] 
it follows that hy+y')+hy—y') = 9p!) +My)) ... (627) 
whenever у, y' are of the form 74 01... Ng yg, where 
У Му) =—log |д(у) |. 

Since 01, yo, ... are dense and h is continuoustin the norm topology of X it follows that 
the relation (5.17) is valid for any pair у, у’. This implies that h(y) can be put as 

Му) = (Sy, y) 
where S is a positive semi-definite Hermitian operator. From the continuity of u(y) 
in the S-topology it follows that S is an S-operator. If we now consider the distri- 
bution А on X defined by the equation 

Aly) = exp [—(Sy, )]. 
We have Ar3X(0) = 8), 
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Hence A77! is a shift of ur. Since both Ar and yr give unit mass for the sub- 
group H of 2% the element Z by which Ат is shifted to obtain j/1-! belongs to H 
and hence Z = T(x) for some z,€X. Consequently, 
H= Аж ! 

or ШУ) = exp [i y)—(Sy, У). - 
Conversely, if (у) is of the form my) = exp [(жу, y) — (Sy, у] 
the distribution ит іп Z® is Gaussian and hence so ів д. 

We now prove the representation theorem for infinitely divisible distributions. 

Theorem 5.10: А function и(у) ià the characteristic function of an infinitely 
divisible distribution u on X if and only if it is of the form 

uly) = expli (zo, y)—(Sys 9) + Г Же, у) dM) .. (5.18) 

where ax is a fixed, element of X, 8 an S-operator and М a o-finite measure giving finite- 
mass outside every neighbourhood of the origin and for which 


| | aM < о. 
lle] <1 


iene 12059 ] 1 


Here К (a, y) із the function K(x, y) = iij 


The representation (5.18) of mly) is unique. 

Proof: Let p(y) be the characteristic function of an infinitely divisible dis- 
tribution ш. Then in the same manner as in the proof of Theorem 7.1 of Partha- 
sarathy e¢ al (1962a) we can construct a sequence of distributions A, such that (i) An 
=o(M,), (ii) M, increases to М, (iii) А, =A, (v) B= Ляу where Mo is 
Gaussian. Í 
Now using Theorems 5.7 and 5.9 we 
from Theorems 5.8 and 5.9. Uniquenes > 
Theorem 8.1 of Parthasarathy et al (1962a) but keeping in mind 
X playing the role of the character group is connected. 


can complete the proof. Sufficiency is immediate 
s is proved in an exactly same manner as 
that the space 


6. CoMPACTNESS CRITERIA 


find out the necessary and sufficient conditions 


In the present section we will 
divisible distributions may be weakly condi- 


in order that a sequence м» of infinitely 


tionally compact. 

If рг is any infinitely divisible distribution by д = [в 8, М] we will mean ae 
the three quantities occurring in the representation of according to Theorem 5.10 
are respectively 2, S and M. We will associate with any such д = [®, S, M] another 
S-operator which we will denote by T. 


= (Sy, (2, y)? dM (x) (6.1) 
(Ty, y) = Xy, У) wae ie 


T is an S-operator since f- lal dM(z) < oo. 
lel < 1 
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Lemma 6.1: In order that à séquence jt, of Gaussian distributions with co- 
variance operators S, be shift compact it is necessary and sufficient that S, should be 
compact. [If p is Gaussian with covariance operator S its characteristic function is 
exp [жє y)— (Sy, y)]. 

Proof: Sufficiency is immediate from Theorem 3.4. We will prove necessity. 
If д, is shift compact then |, [2 is compact. But |#a|? is Gaussian with mean zero 
and covariance operator 28,. 
- Ш) = exp [—(Shy, 5). 
From Theorem 3.3 it follows that there exists a compact sequence U, of S-operators 
such that 


| [ag *Y) < (Uny, уе. 
Hence there is a д such that for any n, (Uny, у) < ò implies (S,y, y) < 1. From 
this we can deduce that 


(Sys у) < SUY, y). 
Since ô is independent of n and U, is compact, S, is also compact. 
Lemma 6.2: Let p be a symmetric infinitely divisible distribution with 
й = [0, 0, M]. Further, let М be concentrated in the unit sphere. Then 
Л dye < S lelt А-З lie]? M< со, 
Proof: Tt is enough to prove the theorem when M is finite since the other 


case can be obtained by limit transition. Let M(x) = t and F be the distribution 
such that F(X)— 1 and M = tF. Then 


: UE" 
—eU 
2 ex 


f [| aF" = Е X,+... х, 
= "Е Xy rtr DE Хе 2r(r—1) Е(Х,, X3)? 
= 7H | Х| +-5г(—1)(Е ||X,]2)2. 
Expectation is with respect to F and Х,,... X, are independent random variables 
in X with the same distribution Р. Terms with zero expectation have been omitted. 


[їн < етк!” BIE qu cus rr nra pp 
г т! 


= tH [ХЗ х,а) 
cet [fX AF3 у |X| ary 
= [Хх аЗ [ |X|? 4M}. 
Lemma 6.3: Let и, be symmetric infinitely divisible distributions. such that 
i ш = [0, 0, M,] 
with М„ vanishing outside the sphere ]X|| < 1 for all n. If ил is compact then 
Sup ПРЕ dey < ©. yx 
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Proof: Since M, is concentrated in the unit sphere 
J |X| aat, < J XP М. 
Theorem 5.6 implies that ° вар f |X|? dM, « o 
n 


and hence an application of Lemma 6.2 will complete the proof. | 


Remark 6.1: In the same manner as Lemma 6.2 it can be shown that if М 
is symmetric and vanishes outside the sphere ||X|| < 1 for 
в = [0, 0, И]. 
We have J (æ, узар = J (ш, УМ. for all y. 
Lemma 6.4 :- Let дн be a weakly conditionally compact sequence of symmetric 
distribution. such that 
sup J Х| dp, < co. 


Then if S, is the covariance operator of Pm IA is compact. 

Proof: 1f6,is а sequence of distributions on the real line such that | 

0, —— 0 and f a? dô, is uniformly bounded then 
f ad0,— J #40 as n о. 
Theorem 3.5 сап be applied now and the lemma follows at once. 

Theorem 6.1: Let м» be symmetric distributions that are infinitely divisible 

with representations 
Hn =10; Sw Мы. 
Then in order that и be compact it is necessary and sufficient that 

(i) М, restricted outside any neighbourhood of the identity is weakly condi- 
lionally compact, 

(ii) T, as defined in (6.1) із compact. — 

Proof: We will first prove sufficiency. “Let unit sphere be chosen as the 
neighbourhood and let us write M, = М@-+ М where и? and МФ are d 
the restrictions of М» inside the unit sphere and outside the unit sphere. i Since Г 
is weakly conditionally compact and F'n ==> F implies e(Fn) = e(F) it is enough to 
show that the distributions à 

à X, = [0, 8 M) 
уала) = AS NHI (E у AMPE) 


= (Ss. , y}? dM) 
(85g, y) uk i e 


form a compact sequence (v; 


= (Try, y) 
Since T', is compact sufficiency follows from Theorem 3.4. че " (1) is а сопве- 
quence of Theorem 5.3 and (ii) follows from Lemmas 6.1, 6.2 and 6.4. 
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Theorem 6.2: In order that a sequence и» of infinitely divisible distributions 
with representations 

Ив = [ny Sn; My) 

be shift compact it is necessary and sufficient that 

(i) Mn restricted to the complement of any neighbourhood N of the origin is 
weakly conditionally compact, 

(й) 7, аз defined in (6.1) is compact. 

Proof: Since y, is shift compact if and only |j, |? is compact what we need 
are conditions for the compactness of 

[ps]? m [0, 28,, М„+М„]. 


In addition we have f (x,y)? d(M,+M,)=2 f (a, y)dM,. 
ell <1 loll < 1 


Hence the theorem follows from 6.1. 
Theorem 6.3: In order that и» with the representations 
En = [m S, И,] 
be weakly conditionally compact it is necessary and sufficient that in addition to the 
conditions of Theorem 6.2 x, should be compact in X. 
Proof: In order to prove the theorem it suffices to show that whenever 
Pn = En Sp, My] 
is shift compact, An = [0, Sp, M,] 
is compact. Let F,==>F. Then the following convergence takes place in norm. 


® z 
тї] ep P 


Hence we can assume that M, vanishes for all n outside the sphere [|| < 1. Let us 
now consider 


‚ Л®) = ГК, y) dM, (x) 


» , y)? dM, læ yel gar 
кчы е Soe isi НЫ D 


< Ty, y)- f 1 у) | lal? ФМ (ж) 
< Ta, у) J (x, vll? dM, | lel? dL, 


S 07у, y) - C(Twy, y). 
Hence given any p > 0 there is à à > 0 such that |/X(y)| < p whenever (Ty, y) < 9. 
But since An(y) = exp [fn(y)], for any e> 0 there exists a p> 0 such that 1—RA,(y) < € 
whenever | fn(y) | < p. Consequently, for any є > 0 there exists а ô > 0 such that when- 
ever (Tny, y) < ô it follows that 1—RAn(y) < e. Here R denotes the real part. Now 
Lemma 5.1 and Theorem 3.5 imply the validity of the theorem since {7,) is compact. 
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Remark 6.2: In defining the operator T' we could have taken any bounded 
sphere around the origin instead of the unit sphere. When M, restricted outside any 
neighbourhood of the origin is known to be weakly conditionally compact, the com- 
pactness of Tp when it is based on some sphere implies the compactness of 7, when 
it is based on any finite sphere. 

Since the representation is unique one can give conditions for the convergence 
of : 
Hn = [En Sn; Mn] 
to the distribution - p= [, 8, M] 
in terms of [z;, Sn, Mn] and [2, S, M]. However, we will mention only the following. 
Theorem 6.4: Let и» be a sequence such that jin has the representation 
д = [En Sn, M3]. 
If па — р, и is Gaussian if and only if 
M(N’) 0 as n со 
for every neighbourhood N of the origin. k 

Proof: Let ш be Gaussian. Since д cannot be written as e(F)«A with A 
infinitely divisible it follows that M,(N')—0 as no for every neighbourhood N of 
the origin, conversely, if 2,(№)-—0 for every neighbourhood X in exactly the same 
manner as in the proof of Theorem 6.1 of Parthasarathy et al (1962a) it can be shown 
that (y) is of the form 

щу) = exp [2 (жу, 9) (SY, УЛ] 
which shows that is Gaussian. 


7. ACCOMPANYING LAWS 

Definition 7.1: A sequence aj of distributions j = 1, 2, ..., kn, ie 12; 

is said to be uniformly infinitesimal if for any neighbourhood № of the origin 
It inf | oj (N) = 1. 
n—91«j«&kn 
Theorem 7.1: In order that o; be uniformly infinitesimal it ts necessary that 
№ sup - sup |; (y)—1| =9 
пә» 1«j&l ЇЙ<К 

for every constant К. 

Proof: "This is immediate from Theorem 5.4. 


Theorem 7.2: Let c; be uniformly infinitesimal symmetric distributions 


with non-negative characteristic function. Let , 
ы kn 
= П o, 
Jn pa nj 
T e) 
= ©), 
and А, be defined as An A м 
In order that р, =  ї8 necessary and sufficient that An = Б. 
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Proof: Let и» be compact. Since the inequality e* > z is valid for all x 


we have «(®ъ)(у) > (у) for j= 1,2,..., в, n =1, 2... 
since (y) is non-negative №) > Haly). 
Ог 1—5) < 1— Anly). 


From the compactness of и„ and Theorem 3.3 it follows that A, is compact. Now 
let А„ be compact. It follows from Theorem 6.2 and the remark made after Theorem 
6.3 that 


(i) F, restricted to N’ is weakly conditionally compact for every neighbour- 
hood N; 


(ii) the sequence S, of operators defined by 
(S.y, y) = f, (x, y) dF (2) 
к<! 
is compact for every t. 


Here Р, denotes the sum %p1+-a%n9-+...cénkn . We will now show that jj, is compact. 
We have 


kn 
1— ly) < 4 [1—„у)] = f [1— сов (x, у), (а) 


f [1—cos (x, yrs). ^ [i-cos(x,y) 2Р2) 
“eit 


= т, (x, SEES Кашы > t] = Spy, у)-Е2Ё„[|х|| > 1]. 


Since Fn is weakly compact outside any neighbourhood we can choose t such that for 
all n 

; Filz] t] < e/2. 
Since for that fixed /, Sa is compact Theorem 3.3 shows that |^» is compact. 


We will now complete the proof by showing that whenever A, is compact, 
for every costant Ё 


su А Ln 0. 
i а [Аз (у) — и» (у) | > 


To this end it is enough to show that 


sup su E 1— < 
мек “п : d те 


But the expression is equal to 


Sup su 1— ; Y)]dFn = a A 
F up J [1—cos (x, y)] ч x. mu iu cos (x, УЛаЕ»-- sup 2F ,(||e||>1) 


< He вир f |lel®dF,+2 sup Ён (|| > 1) 
^ legi E 
< oo. 
The last step follows since A, = е(Р,) is compact. 
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Lemma 7.1: Let op; b : pols orn А Е 
m-— c; be uniformly infinitesimal. Then if ay is defined by 


y= f ad 
е lel & 1 di 
then 
sup |i] — 0 as n= oo. 
1<j <in »l s 


Proof: Let be arbitrary and V the sphere ||| < є. Then 
lel] < 1741-1 Г < etos V' 
r Ma ee 


Hence lim su 
bon fuer 
Since € is arbitrary the lemma follows. 

Lemma 7.2: Let æn be uniformly infinitesimal, Let æn be defined as т 
Lemma 4.6. Then if Onj = Onyx (—2%nj) there exists a т such that for all 1 <j < kn and 
n > то we have 

| Г 240 < 20, > Ц. 
e nll < 20,11 > 1] 


Proof: Те-т be so chosen that 


su || < 1/4 for all n > no. 


р 
1<j<k 
Then [ а00 = [ (ды 


„= 
ЕН [ет <1 


= = ado, | elle, >ц, 
ERA Тш er Pe 


Therefore, for n > по and 1 <j < kw 


| Г 240,1 < 16 > 1 
ГЕЙ 120 S adl 3 abis КЕ ellos ll > 1] 
< ча 6s el > 11 

<< 


<E an [lel > 3/4]++4 вы] > 1 
<: бә | > 115-2 0, > 1 
< 20; [el] > 1. 

Lemma 7.3: Let Ри be a sequence of o-finite measures such that F'n restricted 
to N' is finite and weakly conditionally compact for every neighbourhood N of the origin. 
Then for any € > 0 there exists a compact set K such that 

FK) < € for n= 1,2, ... 

Proof: Let be any positive number. Choose a sequence N, of neighbour- 
hoods of the origin decreasing to the origin. Let A, be defined as М, ,—N;, No being 
taken as the whole space. From the con itions of the lemma it is possible to find 


a compact set К, in A, such that for all n, 


F(A,— P < F 
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Let К bedefinedas — ^ K= Ü KUM. 
t= 
Since KNN, =K, UK... UK, 


and N, decreases to the origin it follows that K is compact and 
LÀ 
РДК) < È F(4,-K) < е 


We now proceed to prove the main theorem of the section. 


Let on; be uniformly infinitesimal sequence of distributions on X. Let Hn, a, 
On;, An be defined as follows. 


dn -H ат 
rct 
бм = anf (—tuj) 
beu] bn 
= П 40) (E 2) 


In what follows we will adopt the above notation. 
Theorem 7.3: If и» is shift compact so is An. 
Proof: Since и» is shift compact ||? is compact. But 
kn 
lal? = |а 


It follows now from Theorem 6.2 that if one defines 


= 48-2 |9, |. eod 


Then e(F,) is compact. We сап now арб. Theorem 6.2 and Lemma 7.3 and deduce 
that for апу є > 0 there exists а compact set K such that . 


РК’) < є forn=1, ayes wee (12) 
Let us now define the S-operators Т'„ by the formula | 
2 (Tay y) = Г (=, у)? 4Ё'„(ш). 
~ < zi 


Then for any finite t the sequence Т, is compact. 
Let us now define Gh as 


kn 
б„ = X 0s. SENTI) 
ja 
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In order to complete the theorem we have to show that e(G;) is shift compact or 
(а) Gn is weakly conditionally compact when restricted outside any neigh- 
pourhood of the identity; 
(b) if the S-operators 8, are defined as. 
(Suy, y= у i КСА 9)48., 
lel <1 : 


then S, is compact, 


Since æn; is uniformly infinitesimal by Lemma 7.1 On; are also uniformly infi- 
nitesimal. Hence for any € > 0 there exists a compact set C such that - 3 


On(C) > 1—e for all n and 1 <j < I. 


In the same manner as in the proof of Theorem 5.1 of Parthasarathy et al (1962а) we 
have for all n and 1 <j < kn 


l6, AE) = Г Role) > | EHe) 
> (1-9) иб (Ка) = (1—е©1—вир бы (7-2 


= (1-96, F0) = (1—6) ED. . e 04) 
where К, is another compact Уе Жүл а 
In a similar manner it can be.shown that if V. апа Хате two neighbourhoods of the 
origin such that V+V GW and 
Ө.) > 1—e for all TER A 1«j&khi oue (1.6) 
then for n > n andl«j «Hh, 103140) > (1—6)0м (№) (7.6) 
(7.2), (7.3) and (7.4) imply that for any € > 0 there exists a compact веб K, such that 


ак) < e for all n. (7.7) 
compactness of F,, when restricted 


On the other hand (7.1), (7.3), (7.6) and the weak : 
y neighbour- 


to the complement of any neighbourhood of е origin imply that for an 
hood N of the origin, Э 

f sup G,(N’) <<. (7.8) 
‘conditionally compact when restricted outside 


(7.7) and (7.8) prove that G,, is weakly 
prove (b) let us consider 


any neighbourhood of the origin. To 


= ff @— dB dou) 
ЖОЛЫ HE 


tatu р з чет gd Ae 7 
еы Wai У) TO (0) Un a 
оо 3 iat 

e ceres а У шы 
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Since 0, is uniformly infinitesimal we can assume that 6,/(llz|| < 1) > } for all suitably 1 
large п and for all 1 <j < kp Hence 


(8,0, у) < (Puy, )--2(0,0, y) 
z У En 
where (Uny, у) = x E J, y )a0, (=). 


Since we know that T» is compact in order to show that S, is compact it is enough 
to prove that U, is compact. We will show now that trace of U, tends to zero as 
-n — со, and this would complete the. proof. 


Let us put Wu. So 210, 
"Ж Ыс 


Then for the trace of U, we have 
TUS Uk E E df «C, su luu) (2 У $ ил). 


From Lemmas 7.1, 7.2 and 7.8 it follows that trace U, — 0 as n > co. 


Theorem 7.4: If A, is shift compact so is pp. 


À En TEM 
Proof: 1— |а, |50) = e 16,0) < Z 1— 106,159 


kn kn 
< 2% [1— Real 0 j(y)] = 2 X [[1— cos (x, y))20,; 
j=l j=l 


| = 2 J [1— cos (x, y)]dG,(z) 


=2 Г [I—cos(s y)HG,(2)--20,]]el >] 
let 


< Ыы © ya (ar)+ 26, >й 
= (бш, 9)3-2G, [e] > й. e (19) 
From the shift compactness of A, for any e > 0 we can choose # such that 
: в„Шх> £j < 5 for all n ... (1.10) 
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and the sequence of operators S, defined by 


(Say) = f, (2, yy dG, (x) 
elit 


is compact. From (7.9), and (7.10) the compactness of S, and Theorem 3.3 the 
present theorem follows. 


Theorem 7.5: Let A, be shift compact. Then for any finite number t we have 
А0) = 0. 


№ sup 
n> |< 


Proof: Since 0,; is uniformly infinitesimal it follows from Theorem 7.1 that 


№ sup вир |6,(y)—1| = 0. 
кә <: 14$ У 


kn 
Hence it is enough to show that sup sup X |б„(у)—1| <0. Ae quis 
^ «Е 371 


То this end we Вахе 
6,9)—1 = 1166—1040, 


DE (2) —1—i(z, УЕ ж, Y)d0 y+ eit?) 10, 
| wee рш D. ie iam VE Я 


000) = , 1)°40, х1,11+-2010181 > 1]. 
AORE Val: dis yyd0,,- ill I le А 119200181 > 1] 


(7.11) follows at once from the compactness of Sp Lemma 7.2 and (7.8). 
Theorem 7.6: In order that pit, = и, where x, are arbitrary points in 
X, it is necessary and sufficient that Aa, => Ш. 
Proof: This is an immediate consequence of ‘Theorems 7.8, 7.4, 7.5 and 5.5. 
Theorem 7.7: Limit distribution of sums of independent uniformly infinitesi- 
mal random variables in X is infinitely divisible. 
Proof: Theorems 7.6 and 5.1 imply the present theorem since 
: divisible for each n. Я 
Theorem 7.8: -Let шиж, => 4. In order that p may be Gaussian it is neces- 
sary and sufficient that for each neighbourhood N 


№ G,(N’) = 0, 
n>% 


А, is infinitely 


kn 
where б„= IU 


Proof: This follows immediately from Theorems 7.6 and 6.4. 
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ON THE USE OF A DISTRIBUTION-FREE PROPERTY IN 
DETERMINING A TRANSFORMATION OF ONE VARIATE 
SUCH THAT IT WILL EXCEED ANOTHER WITH A 

‘GIVEN PROBABILITY* - 


By SAM C. SAUNDERS 
Boeing Scientific Research Laboratories, Seattle, Washington 
SUMMARY. Let X and Y be independent random variables, with continuous distributions 
Fe and Ge, on the sample space 92. Let o bea homeomorphism from onto itself and define 
Hlo) = J Е(®)44. 
From samples of Ж and У we form Ё and б which are estimates of F and G, respectively, and define an esti- 
mato of H, say H, 
Йо) = J Foe 
for osQ, а class of homeomorphisms linearly ordered by H- 
X as the strength (of some device), then 


Interpreting o(Y) as the strain under use o and 
fit. If wo use Н to determine 


Н(о) = PLX < o(Y)] represents the unreliability and H(«) is an estimate o 
true unreliability Н(о) is too large? Wo examine this problem 
uution-free with respect to F and G respectively. 


) and G(@™) are distrib 
allows one to obtain stochastic bounds in others. 


then what is the probability that the 


a use à, 

under the assumptions that ВЕ" 

Тыв provides an answer in some cases and 
1. GENESIS OF THE PROBLEM 

of determining the unreliability of 


drocarbon fuel to be installed. The 
emperature environ- 


We were asked to consider the problem 


a missile fuel tank for a given weight of liquid hy 
volume of the tank, the specific weight of a batch of fuel and the t 


ment were stochastic variables. 

In the temperature-pressure range for this specific problem considerations 
of the physical and chemical properties of the fuel show that the design pressure 
is exceeded if the following event obtains: [X < wg(Y)], where X is the random 
volume of the tank, w is the weight of the fuel installed and g is a known function 
(determined by the bulk modulus of the fuel, the temperature variation of range 
and the design pressure of the tank) of Y the random specific weight of the fuel batch. 


Tf a weight w of fuel is installed, the design unreliability of the tank, 1.е., 
the probability of the pressure exceeding the design specification, is Hw) = PIX < 
wg(Y)]. T£ the distribution of X was F and that of Y was G, both known, then we 


could express 
ни) = | #062269. 


the Mathematics Research Centre, 0.8. 


t No. 225, March 1961. 


as on leave at 


done while the author W! 
1 MRC Technical Repor' 


* This work was 
and was issued as 


Army, University of Wisconsin 
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This equation defines the function Н and hence 

(a) for a given weight w of fuel installed we can determine the design un- 
reliability H(w), S 

(b) for a specified design unreliability of at most e we can seek the maximum 
weight w for which we have H(w) < e. 
However, in practice the distributions Ё and G are usually not known and we have 
only sample values of tank volumes and the sample values of the specific weights 
of fuel batches with which to arrive somehow at answers which correspond to the 
cases (a) and (b). 

It is the study of the statistical problems for the situations (a) and (b), within 
the more inclusive formulation of the problem that we propose, which constitutes 
the subject matter of this note. 


2. INTRODUCTION AND RELATED RESULTS 


Let (æ, а) be a measurable space for which (Œ, <) is a partially ordered set, 
i.e., the relation is reflexive, transitive and such that x «yandgy-zimply x = y. If 
the relation is measurable, i.e., for all ze the set; {ye2:y < x) is in a, then a distribu- 
tion can be defined on 42 for each measure P on а by F(x) = P[X < 2], with the 
usual interpretation of X as the identity random variable (r.v.). 


If for each ze the set [X = т] has measure zero, then the distribution F 
is continuous. Now is called here а positive sample space if and only if (iff) for 
each х < y, 2 + y we have the set [X > c]()X < y] has positive P-measure for all 
probability measures P under consideration. 


Let X and Y be random variables taking values in the positive sample space 
with continuous distributions F and G respectively. (The generalization to Y 
taking values in a space different from .4? will be seen to be immediate.) Let О be 
а set of transformations on Œ onto itself. For each weQ we have the probability 
that X precedes o(Y) according to — given by 

H(o) = PIX < «(У)] = f Foda фы (2:1) 

where we make the convention that juxtaposition of functions refers to composition 
and integrals are understood to be over the entire space W. 

Let us assume that while F and G are unknown we can obtain samples of X 
and Y from which we form, respectively, the estimates # and G of the distributions. 
Thus we define an estimate of H. ,say H, by 


H(o) = f Fodá ... (2.2) 
for each weQ for which the integrals exist. 


Some questions of interest are: In what sense is H а good estimate of H ? 
If is known, what is the distribution of H(o)? But primarily we want the maximum 
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transformation & (аз a function of H ) for which for each Ре, Ge 8 we have 


P[H(à) > в] <a 
where a and є are small and specified in advance. That is, we are interested in the 
problem of using Й to obtain what would be tolerance limits were Н a probability 
distribution on О. 

Tf О consists of a single point o, we can without loss of generality assume 
it is the identity transformation. We then have the problem of estimating 
p = P[X < Y] from samples of X and Y. | 

Suppose that is the real line with the usual ordering and ¥= is the class 
of continuous distributions. Then it is well known that the empiric distribution, 
say б, is the unique minimum variance unbiased estimate of G. But, further, f 
defined analogously to (2.2) is the unique minimum variance unbiased estimate of p 
and mnp is the Mann-Whitney statistic where m and n are the sample sizes used in 
computing Ё and б. 

The problem analogous to ours, i.e., involving ф and p, has been studied and 
bounds for the corresponding estimates have been obtained for large sample sizes under 
the assumption that one of the distributions is known (Birnbaum, 1956) and assuming 
that neither of the distributions is known (Birnbaum and McCarty, 1958). 


3. RESULTS 4 

Let < bea partial ordering оп the set of all transformations of 4? onto itself 
for which (О, <) is a linearly ordered subset which is order complete. That ig, 
(О, <) is a partially ordered set for which any two elements are comparable and each 
non void subset of Q which has а lower bound has an infimum. 

For a sample of independent туз each with the same distribution Ее, 
the estimate F (which is a measurable function of the sample) of F is ample for F 
iff the random function И Р-1 has the same probability law for every F € A. 

We now make our assumptions : 

(1) = and $ are classes of continuous distributions on the partially ordered 
positive sample space (2, <). Г 

(2) (Q, <) is à linearly ordered e 
such that 


omplete space of transformations on 92 


o < © implies (2) < olx) for all ae. (3.1) 


(3) ў and @ are ample estimates for Fe F and Ge 9, respectively. 
Since there always exists а set of transformations on W, say Г, such that 
F = (Foy :y eT) and similarly @ = {GA : À € A) where Г and A each contain the 


S " S, 1 В 
identity function, № is seen in this representation that F is ample iff Ру (or equi- 


valently iff yy) is distribution-free with respect to 7. We shall adopt this repre- 
sentation wherever convenient, without comment, in what follows. 
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The set 
Ф = {yoà : AeA, yel, вв О} ... (3.2) 
is а set of transformations on @ which is, in general, partially ordered by <. Let 
A$) = J фаб, ... (3.3) 
Hg) = ГРУ "gati e (8.4) 


be two functions, the second random, defined for each $ € Ф for which the integrals 


exist. (The functions P and G may both contain discontinuities and these cannot be 
made to coincide.) 


We now define for д an element of the range of Ё 
à, = inf {o c Q : H(w) > д} ... (3.5) 


which by completeness of О is а г.у. Let є be in the range of H, and Н. We now 
pick the unique 
0.60 such that H,(0,) = є ... (3.6) 


and there exists a unique œQ such that H(o,) =e. ». (870 


We now have the following theorem. 


Theorem 1: Let assumptions (1), (2), (3) be true and H and H, be defined 
as in (2.1) and (3.4) respectively. Then if 


QD О САО. 
we have ВН») > €] < PI > Й.) e Q8) 
with the right hand side of (3.8) being а distribution-free bound for every (F, G)eS x 9. 

Proof: Itis ibas by (3.1) with probability one that H is monotone increasing 


on О = Ф (always 0:50). We have by the positivity of the sample space that 
H is strictly monotone on Q and thus we have 


[Н(Ф;) > €] = [6% > oJ C [9 > H(o,)] ... (8.8.1) 


because H(és) < 8. Now since үшА-1 = 0, by (3.7.1) we have Z6, = Ho.) 
that H,(0*) has a distribution independent of x. is clear by assumption (3). 

Remark: If Й is continuous a.s. then we obtain equality between the 
probabilities of equation (3.8). z 


Remark 2: An example of a situation when the assumption Q 2) Ф is satis- 
fied occurs when Q is a semi-group, with respect to composition, of homeomorphisms 
on Xand T and A are subsets of Q. We exhibit a trivial instance of this kind in the 
next section. : 
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We now state the following theorem. 
Theorem 2: Let assumptions (1), (2), (3) be true and Н and H, be defined as 
in (2.1) and (3.4) respectively. Then if 
for each (y,A)eTXA we have > yor for all weQ aa (3.9) 


then we have РН) > <Р > HO) .. (3110) 
with the right-hand side being a distribution-free bound for every (Е, G)e Хх 9. 
Proof: Tt is clear by (3.1) that both H and H are monotone on Q. Thus 
we have í 
н) > d = là; > oJ GIS > Ё). 
Now by (3.9) we have yo, A3 X o, and hence 
e = H(o) = Hiya.) < Hie) 


but € = but then аз. 


Но.) < Hy(@,) and since 0, ©,¢Q we have 0, L 9. 
Но.) > É(0,) and it follows that 
[3 > H(o))] cp > HO) 
and the theorem is proved. 
Remark 3: A useful con 
above, that Aw > УФ, is 
y <A and either До < oA or КҮЗЕ 


r.y. is stochastically smaller than the other 
which satisfies a convexity type condition. 


dition which implies assumption (3.9) of Theorem 2 
- (3.10.1) 


In most instances this means only that one 
and that one of the r.v.'s has a distribution 
Let us make the additional assumption 


(4) 7, А are 1—1 transformation in T and Л respectively: We can i 


case define 


n this 


Aig) = | Fx? oda. (3.11) 


We remark that by assumption (3) being satisfied, Hd) has a distribution which 


depends only on $ Ф. Let 


©, = фе: РІЙ) > € = 9) (3.12) 
For a e (0, 1), Ф, is not empty whenever there exists 6.60. such that 
P[É 0) > = 2. 
We now state the following theorem without proof : 
nd H, 


Theorem 3: Let assumptions (1); (2), (3) and (4), be true with H а 
defined ав in (2.1) and (3.1) respectively then if for each 
феФ there exists a unique largest element in О, say фа, such that фо € 9. 
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Then writing à = (ФА) for de®, 


РЇН@) > в] < PIE) > є] = а ... (8.14) 
and the right hand side of (3.14) is distribution-free. 
Now we define 


бв = sup (7-1 dA)Q ... (3.15 
Qs sup (у ФА) (3.15) 


ав the element to be used in our estimate. 


Remark 4: In case there exists a ø, such that 7-4 Л = à,cO we obtain 
equality between the probabilities in equation (3.14). 


4. EXAMPLES 

Let us first give an application of Theorem 1 and take @ = (0, co), 
Q= (o : ок) = ox, o> 0), i.e., here we take the transformations о to be scalar multi- 
plication by a positive constant to which wegive the same designation. Let the order- 
ings <, < on О and æ be the same and be the usual ordering on the real line. Let 
Fo and б, be distributions on @ and take 8 = {0,0 : 660}, S = (Fo : weQ}. 
Without loss of generality we may assume that ИХ = 1/у and EY = 1/A. Now we 
define F(x) = ЕХ), G(x) = 92У) and these two estimates are ample for 5 and 
4 respectively. 

The assumptions of Theorem 1 are seen to be satisfied and @ is the unique 
element of Q such that 


J Fux) аба) = е ME 


but then бє = yo,A-! so a.s. we have by the corollary 


(о) = H,(0,) = I F .(® 46, ( (4.2) 


x 
ux Р) AY ) 
which must needs be tabulated at points of interest. 


The integrations (4.1) and (4.2) might be difficult to perform and could require 
numerical integration techniques depending upon F, and G,. In order to continue, 
let us make the mathematically convenient assumption that the densities exist and 
are given by _ 


£/2)—1,—2, 
МӘ) = em Me) et 


where Ё is a known parameter of the chi-square distribution. 
Thus from (4.1) we obtain 


Т воду ае = F eif y)dy = (1+2/0)-* 
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and бе = 2/(e/*—1), but fortunately this integration solves (4.2) and we seo 


Яве ) = (14+ 2yX/0a¥)-¥2 
from whieh the probability distribution сап be found by a simple transformation, 
using tables of the F distribution, since УХ has a chi-square distribution with 
kn degrees of freedom and 2AY has a chi-square distribution with 2m degrees of freedom 
where n and m are the sample sizes used in'computing X and Y, respectively. 


5. AMPLE ESTIMATES OF DISTRIBUTIONS 


In this note we restrict ourselves to estimates Ñ of F which are ample. This 


property could be of some importance in other applications since in particular Р 
ample for ¥ tells us that the analogue of the D, statistic, say, 


D= sup |F()—FG)| = sup ПИРУ) 
EGC (0, 1) 


is distribution-free with respect to <7. 

Ampleness of an estimate of a distribution F is a strong property since, in 
many instances, it allows us to place confidence contours along the entire distribution 
function, Suppose Ё is ample for <, a set of distributions on the real line, and ¢ is 


cata. ST 
some distribution function on the unit interval which is in the range of ЎР and for 
some constant f we have 


p = PIËF-= <0 > POR < Р] 


where ¢-1(¢) = inf {æ :((2) > t} then {-1Ё provides a lower contour for Р with p 
confidence. 

We have ampleness within those families of distributions on the real line for 
which one can estimate percentiles and bounds on those estimates from observations 
in the region of central tendency such as normal, log-normal, exponential, ete. But 
we may have ampleness for distributions on spaces of higher dimension, as the follow- 
ing two examples show. 

Let @ be the set of continuous distributions on any partially ordered set (2, x) 
as defined in Section 2. For y c 4? define с(., y) to be the indicator (characteristic) 
function of the set [Y < y] Then 


Ue) = = p oe, Y) sv (6.1) 


is an ample estimate of the distribution @ e @ from the sample (Fare К) ОЁ inde- 


pendent observations each with distribution С. We shall call G the empiric cumu- 
lative. Я f 
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Let 4p be Euclidean p-space, with < the usual ordering, the elements of which 
we shall take to be column vectors. Let X be 77 (д, Х)* and let D be the non-singular 
matrix such that DED’ = I. We shall say here, for brevity, that D diagonalizes 
X. Then we can write F(x) = F,[D(z—j)] for æ e 49, where Р, is the distribution of 
а 72(0, 1) variate. 


Let f; and Ў be the usual maximum likelihood estimates taken from a sample 


of size n > р and define D as the diagonalizing matrix for X. It exists a.s. Now if 


we define 
(=) = Р(#—и) и. (5:2) 


and the corresponding definition of 5 then fix y 42 and set 


T = ёо-ҷу) = Ё” Чу) p. (68) 
that this has a distribution which is the same for all д, X is sufficient for ampleness. 
But clearly D(ĝ— p») is z(o, = I ) which is sufficient that the vector in square 
brackets in (5.3) satisfies the condition. 


Set B = DD. We show that this has a distribution independent of у, X 
also. From (5.2) we have 
^» 


ЁрЁ vB = 1. ... (5.4) 
But nD X D' is distributed as X ZZ, where Z; is 72 (0, 1) independent of Z; (i + j). 
i=1 


Hence, except for zero probability; Ê diagonalizes a random matrix which 
has a distribution independent of y, X. Note that Т”Т is, except for a scale factor, 
Hotelling’s 7? with non-centrality factor y'y[n. 

6. SOME REMARKS ON PROPERTIES OF THE ESTIMATE Н 
Let us set P = EP with similar meaning for @ and H. By applying the 
.Fubini theorem and integrating by parts we have for о 
H(o)-— f Рой 
and H(o)—H(o) = f Fed(G—G)-- | (Fo— Fo)dG. 

Remark 5: Ё is an unbiased estimate of H on Q if F and Ẹ are unbiased 
estimates of F and G, respectively. 

The converse is not true without additional assumptions. For the real line, 
unbiasedness of Ë on the entire set of 1—1 order preserving maps of 4? onto itself is 


sufficient for unbiasedness of # and @ when one imposes the conditions that Fand @ 
are continuous. Е 


*1.0., X is à normal vector with mean vector # and variance-covariance matrix X. 
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We say that P, is consistent for F whenever 
sup |.É—F | PS Ө 


where, as before, the subseript n refers to the sample size used to obtain the estimate. 
We shall refer to consistency exclusively in this sense. 7 
Now : 
(о) Но) = Г(@—@)айо-- f Go —Fo)dG. 


"Thus it follows that | (o) — H(o)| «sup |F—F|--sup 19—61]. 


Remark 6; И Ё, and Gn are consistent for F and G, respectively, then 
Bonn is consistent for H uniformly on Q as 1/n+1/m—0. 

Remark 7: If T and д. are the empiric cumulatives, then Hy», for each 
в О, is consistent, unbiased and asymptotically normal with 1/n+-1/m—0. 


This follows from the known behaviour of U-Statistics. ў 


If we assume that # is consistent and G is the empiric cumulative, we should 


of Ë “stabilizes” rapidly enough. We do not attempt 
following remark. 


ction H defined by 


obtain asymptotic normality 
to find the weakest of such conditions, but we have the 


Remark 8: If Gis the empiric cumulative, then the fun 
uU m > 
Нм(®) = am E „(®Ў;) г 


is a consistent estimator of Н and asymptotically normal if 1/n--1/m0 in such a way 


that улт sup |F,—F| 2 0. 


Proof: Let Zm = a tror) - Ho and define Emn by the equation 


mH, ()— H(o)] = бя That Zai is asymptotically normal is clear, and the 
result follows if È, 0, which is guaranteed by the hypothesis. 


7. AN APPLICATION TO RELIABILITY THEORY 


Let 42 = (0, оо) and let Y(o) be the demand time for à particular equipment 
in a specified environment during an inspection period oflength o. Let X be the life 
length of this equipment under continued use in that environment. 

We want the maximum o such that for given e < 0 

` PIX < Ү(®)] < є. 
We particularize by making the assumption that Y(o) 


without loss of generality we suppose that a scaling factor 


we(0, 1). № 
Ev но) = PIX < oY] = J Foyaew DFe 
е independent тув with continuous distributions 


denotes multiplication and 
has been introduced во that 


where we assume that X and Y ar 
Ё and G respectively. 
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-Now if F is a distribution with derivative f then the function у’ defined by 
«2 e б 
у) = I-FG) for x > .. (ТШ 
is called the failure rate (or hazard rate). 


А common and intuitively appealing assumption concerning life length distri- 
butions is 


(A) the failure rate of X exists and is non-decreasing. 
We also assume 


(B) X is stochastically larger than Y, i.e., F < б. 
But, further, without loss of generality we assume 


goer is given and P[X < У] > є. "ERU 
Let у(&) = 1 у(00 for «>0: 
0 
'Then if we set F(x) = G(x) = 1—7°, х > 0, 


from (7.1) we see that G(x) = @ (=). 

It is clear that (B) implies у < A and we now show that (A) implies yo < oy 
and thus (3.10.1) implies assumption (3.9) of Theorem 2. То see this, note that 
у must be convex and hence y(wx) < wy(x) for w (0, 1) since y(0) = 0. This is pre- 
cisely our condition. 

We now seek the unique @ such that 


Ї ron) dq) = є. e. (13) 


"This may be easily integrated directly ог by comparison with (4.1) seen to be 0, = Ls А 
We now choose Ё, @ to be the empiric cumulatives as defined in (5.1) (which 

are ample). Hence we need tabulations of the statistic 

sh 

mn 


Ao) = p Pr3o10G3.— 1 È È eoU, V) 
i21 j=1 


where Р(Х) = V;, i=1,..,m, and б(Ү) = О, for j = 1,..., m are all inde- 
pendent r.v.'s uniform on (0, 1). 
Consider the events 
А, = [Ув «0 < Vua =0,...,2, ~ 
where Vq,... Vi; are the ordered observations of V with Vio = 0, Vins) =L 
"These events А; form а disjunct partition of the sample space. 


It is clear that B) = U,,, is the well-known Mann-Whitney statistic 
based on samples of size m and n. Now let S,,(0) = mnH((0), and it follows that 
IanO) |At= Ung for Ё = 0,1, ..., m. 
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With the convention that Шш = 0 with probability one, we obtain 


P[S,,(0) < t] INA P[S,í(0) < t| APA] 
= E Non РШ & tl, (0,1. mn. 
We know that ES,,(0)— = 0 and since we have 


EU, =" (п-т) m2, 
it follows by (7.4) directly that 
ES? (0) = È Wk n, 0)EUS; 
k=0 


+00 (m. т 
оо (nln rn 


Hence we obtain the variance of S,,(0) as 


—1)0 
san8,,00) = "= [m- 7 на н. 


The results of the preceding section show that S,,(0) has asymptotically 
a normal distribution with the above mean and variance; however, for small sample 
sizes and small 0 the normal approximation may not be of sufficient accuracy. In 
the following paragraph we give a few formulae in the range of t = ттд small. 
This is near the region of interest since d is small. 

Let Р,„() = PLU mn <. Now, from results and recurrence formulae 
given by Mann and Whitney (1947), one can obtain the following : 


For mzlkzo, 
P,(0) = pe ) for all k > 0, 
1 h=0, 
r= rtg b>l, 
1 if m<2or k=0,1, 
d if m>2ondk > % 
1 i£ m<3or_ = 0,0. 


Р„(3) = eo] "ү ) if m> 3 and k> 2, 


and for greater values of t the numerators are functions of higher powers of k. 
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We have the expression | 


Hin, 6) _ E(m--n, т, 0) 
Утв m+n 
opp te A 
where the notation Z is that used in the Harvard Tables of the Binomial Distribution.* 
Through the use of these tables, for moderate values of m, n and 0, the probability 
(7.5) can be caleulated arid for very small values of 0 the first terms of the expansion 

given сап be used for adequate approximation. 


Similarly : 


F[S4,() < 0] = 2 (7.5) 


2E(m--n, m+1, 0) 
pute 
m 
for» 2 1,m 2 1, 


P[S,0) < 1] = (1-0) 4-——— 


E(m-4-n, m4-2, 0) 
Ete 
m 
for n > 1, m > 2. 
We now return to the reliability problem. From the formulae preceding 
(or others similarly developed) we discover, through tabulation, the «-th percentile 
of the г.у; S,,,(0), thus we find ĝa, dividing by mn, such that 


and Р[8,„(0) < 2] = (1—0)" --n6(1—0)" 4- 


РЙ) < 84] = a. 


Then we set б = inf {06(0, 1) : Êlo) > a) 
which is the empiric derating percentage sought. We know by the assumptions (A) 
and (B) that the theorem applies and thus 


РІН) > €] < а. 


In this case, for example, if we chose 6 = 0 we would find that 


^ _ min X; 


/ max Y; 
which is intuitively appealing. 
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STUDENTISATION OF TWO-STAGE SAMPLE MEANS FROM 
NORMAL POPULATIONS WITH UNKNOWN VARIANCES: 


IL CONFIDENCE ESTIMATION FOR THE MEAN OF A 
STRATIFIED POPULATION 


By HAROLD RUBEN? 
Columbia University 


SUMMARY. The two-stage sampling procedure discussed in the first paper? of the present series 
is used to obtain confidence intervals of predetermined length and confidence coefficient for the mean of a 
stratified population with normal components. The efficiency of the procedure in relation to the confidence 
estimation with fixed sample sizes, when the variances of the component strata are known, is discussed 


briefly. 


For samples of fixed size the estimation of the overall mean Ji of a stratified 
(weighted) population with г components having means and variances ш; and o? and 
weights р = 1, 25. т), the reliability of the unbiassed minimum variance estimator 
involves the (generally unknown) 0%. Accordingly, it has frequently been suggested 
(e.g. Cochran, 1953) that pilot samples be first drawn to gain information about the 
c? (and incidentally also to enable the optimum allocation of sampling among the strata 
to be determined). This suggestion tallies well with the two-stage sampling procedure 


discussed in this paper. ў 
We shall consider here briefly the problem of obtaining confidence intervals 
of predetermined length 2a and predetermined. confidence coefficient 1—4 for 


u= & Pili (1) 


when the components of the population are normal. The appropriate random variable 


and the sampling procedure (nor +++» nor ky, «009 kp) В used, 
where the procedure S may be described às follows (Ruben, 1962). Preliminary inde- 
pendent random samples of size nli = 1, 2, 5 т) are drawn from the r populations, 
the sample observations being zy (= b 2,55 }=1% е» fii) шше; inde- 
pendent random samples of size %—Toi Are drawn from the r populations according 


to the rule 


is clearly = p) 
1 


щ = шах({ 8 Noi) = 1, 2, vat) (2) 
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where 
82; = (ny—1)3 [2 a-(3 а)! | по |, s. (2.1) 


and {с} denotes the smallest integer not exceeded by c. The 2; are the sample 
means based on all the observations, 


ni 
% = E tyn, в. (2.2) 
j=l 
We have (Ruben, 1962, equ. (3.11) ) that in the limit as о; со, 
r r * 
£c z p) ~ 2 Pi tul o ew (3) 


where the tro; are independent Student variables with vo; degrees of freedom, and 


Уш = 7—1. 
The distribution of z is not available, and for simplicity we choose 
| hp =A 6—1,2,.., т), (4) 


where A has to be chosen appropriately. Further, we choose, again for simplicity, 
Nor = Nog = *** = Nor = Np (say). 


On setting 
El Жл, 5 
Ея (5) 
T »—2 4H 
where vp = 1,—1 and w= ( EE ) х 2m .. (6) 


(w is a standardised variate with zero mean and unit variance), we find, on using the 
Cornish-Fisher expansion (1937), the following approximate relation between the 
percentile points of 2 and £, where Ё is normal with zero mean and unit variance, 


TYo 


+ 
fala = (5095) - Ue Ea Ear) 
+{5ESja-+(06r—104) 8 a+ (210—2880) 5.06, =. (7) 


terms of order l/wj being disregarded. Since in order to meet the specifications we 
require 2/2 = а, this, taken in conjunction with (4), yields 


+ 
= (Sog) PE De зв) 


{BER 2+ (96r—104) Ez /o-+(219—288r)Eqjo}/(96r2vg)], ... (8) 
to the same order of accuracy as Zaja in (7).* 


* Detailed examination of higher order terms in the series expansion for k; indicates that the dis- 
regarding of 3rd order terms is likely to affect only the 4th decimal place, even for moderate ro. 
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ON STUDENTISATION OF TWO-STAGE SAMPLE MEANS 
The value of Qm, the conditional probability for fixed sample sizes m;, corres- 
ponding to the region R in the (E, ..., 2;)-plane defined by 


Е: Ра «a ... (9) 
is 2Ф[щ( отл ЕО) 


where Ф(-), as usual, denotes the distribution function of а normal random variable 


with zero mean and unit variance. Since Q is decreasing in each of the 0, it follows 


from the results of the first paper (Ruben, 1962) of this series that the present 
estimation procedure is conservative in the sense that the true confidence coefficient 
exceeds 1—@. У 

Finally, we investigate the efficiency of the procedure by comparing its average 
sample size with the fixed sample size required in order to meet the given requirements 
when the о? are known. Under these circumstances (using the standardised normal 
variate for the estimation of |), $ 


= бы» Bl) QD 


(È niot) 


n; being the sample size for the i-th stratum. The minimisation of 


ям (12) 
1 
А | 
subject to (11) leads to m= Ei ( Хуту) neo (13) 
(14) 


2 f 2 
whence the minimal is — 2* = Ein ( Ap ) | a. 


On the other hand, for the procedure S(no, «. "tri ky «+s Б), provided the 01 


are not excessively small, 


T f 
Bis $ Вы ~ Eri = win 705 (® iot) [е ve (5) 
ef, 1 0 
The efficiency is therefore Я 
У рс 
at м (Bit oer о (16) 
En а т Y pie 
1 


Since the second term 


on the right of (16) < 1, unless either the p; 7; ате equal or r =1, 
when it equals 1, : 


way € CD (№, ERU 
Efficiency < a (eR) 
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(17) gives an upper bound to the efficiency. Supposing, however, we do not allow 
ourselves the degree of freedom available in the fixed sampling size procedure (when 
the оў are known) but rather consider the class of fixed sampling procedures charac- 
terised by the property 

т т, A? 


pu к БАХ ... (18) 


En, En X 


i.e., we consider only the case where 
m =À (6 = 1, 2, ..., 1), ... (19) 
where A' is yet to be determined. From (11), 
NB = r(Eq/a/a)? 
and the total fixed sample size now becomes 


nis т; = (Ё, а/а)? E pio’. ... (20) 
1 


The ‘conditional’ efficiency of the two-stage sampling estimation procedure in rela- 
tion to the class of fixed sampling size procedures, wherein the fixed sample sizes of 
the populations are allocated in the same ratio as the average sample sizes in the 
two-stage procedure, is then 


НН" 3—2 (fen) 
; X En, Yo ‘Wa 
1 
= Wor? f р 3, OB (007—104)E2,--(219— 2887) 1-2 21 
Yo [ m EM + 96r?v2 ] | E 


independently of the о? (cf. Ruben, 1962). This is also the upper bound to the efficiency, 
in the original and wider sense, as given in (16). The conditional efficiency approaches 
1 rapidly. as узсо. 3 
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STUDENTISATION OF TWO-STAGE SAMPLE MEANS 
FROM NORMAL POPULATIONS WITH 


UNKNOWN VARIANCES: 
IIL JOINT CONFIDENCE ESTIMATION OF A SET OF MEANS 


By HAROLD RUBEN* 
Columbia University 


SUMMARY. Confidence regions of predetermined dimensions and confidence coefficient are 
obtained for the joint estimation of the means of a finite set of unconnected and unknown normal popu- 
lations. The confidence estimation procedure is shown to be conservative in character, and its efficiency 
in relation to the corresponding estimation procedure with fixed sample sizes, when the variances of the 
populations are known, 18 discussed briefly. 

Tho paper concludes with a short discussion of the possibility of increasing the efficiency of the 
two-stage sampling procedure (utilised in the current paper as wi Il as in the preceding two papers? of 
this series), and of some other matters. ) 


DISCUSSION OF THE PROCEDURE 
Given r separate normal populations with unknown means and variances 
дь 01, respectively, we wish to obtain confidence regions for р = (Is ns «s My) of 
fixed. dimensions and confidence coefficient 1—«. We propose to use - 


| +H ecu С 2) 


as the basic random variable and S(ng, ... Mors Ёз» k, [Ruben, 1962] as the 
sampling procedure, with vo; = %i—1- Observe that the random variable chosen is 
proportional to the joint density function of the gt : 

The procedure S may be described as follows: Preliminary independent random 
samples of size 20; (i = 1, 7,...› 7) are drawn from the r populations, the sample ob- 
servations being 21 (# = l, 2 +9 7% j=1, 2 5 то). Further independent random 
samples of size ;—Mp; ate drawn from the r populations according to the rule 


n, = max (Fi hm) @=Ъ Drea ts и) 


where 82; = (n; —1)* [ РЕ — (E zy) [or | (21) 


and (c) denotes the smallest integer not exceeded by c. The z; are the sample 


means based on all the observations, 
ni 
2; = X ty. Be (22) 
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4 Ono would perhaps inti itively expect that this variate will generate confidence Por of Bs 
i ize. Some independent support 18 
i fidence coefficient and total average sample size. 
man Әна Pearson likelihood ratio for testing the hypo- 


lent to the choice of y by the fact that the use of the Neyman- x rati 
thesis p; = uO —1,2,57) with fixed sample sizes Jeads to the same quantity with y, replaced by [DA 
i ^ в 
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` From a result of а previous paper (Ruben, 1962) the limiting distribution 
of y(c;—00) is the distribution of 
r fy; \ 0+1) 
1-7% . s 
О омии, e 
where the ¢,,; are independent Student variables with vp; degrees of freedom, and since 
(1-Е ,/voi) 4, вау Yi is a Beta variate with density function 
r( Yoit-l ) 
FIRE yp (1—уу)-% (0 <y; < 1), ve (4) 
(3) v= 
y is distributed as the weighted product of r such independent Beta variables. ‘The 
distribution of the product of Beta variables is unknown and an approximation for 
the distribution of y is therefore called for. An excellent approximation is provided 
by noting that the distribution of 
—2ny = — = (vo;- 1)Iny; {Б 
is asymptotically (уо Эсо) that of a x? with r degrees of freedom. This suggests that, 
for finite у, —2/ny may be regarded approximately as a Type III Pearson variable, 
and this may be verified by inspection of the cumulants of —2/ny. (For analogous 
approximations see, for instance, Bartlett, 1954.) The latter cumulants are obtained 
by noting that the cumulant generating function of Iny; is 


inp ( Yack? )- mr (3%) + mD (Зач ) — ar (1 +) o @ 


(j =-++М'—1), whence the eumulants, K, (—2lny), may be obtained in terms of the 
Pentagamma functions y*—(.) (British Association, 1931), as follows : 


и. (+1) { yon p yen ner) ) b ... (1) 


where = f(x) = (x) = Я dT), ИО * We) (223... .. (7.1) 


Asymptotic developments + the Pentagamma miis in terms of negative powers 
of the vp; yields corresponding asymptotic developments for the cumulants of —2lny 
in equ. (7), and subsequent inspection of these cumulants in series form confirms 
the closeness of approximation of —2/ny to a Type III Pearson variable. We give 
here only the first two cumulants, disregarding terms of a order than the third: 


k,(—2Iny) = 5 X ИЕ X Itt = x 1/v3;, .. (8) 
1 1 41 
к. 20у) = 2r--6 X 16 X 1/8. 2. (9) 
1 1 
The sought for approximation is then that 
—2ту 


ЕЕ СНЕ (10) 
1--(3/27) = Ио 
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is distributed as a chi-square on 7 degrees of freedom, and with this approxima- 
tion the confidence regions are provided by the relation 


ń та) i ШУУРО 00) 


where X. a is the 100%% significance point of a chi-square with r degrees of freedom. 
The confidence regions are ovaloids with semi-axes 


(1-663/89 Ive) the, -$ 
1, = (vik) | exp { ne = \-1] я MET 


and are hardly distinguishable from ellipsoidal regions* with lengths of semi-axes 


i = (1/6) [(1+-(3/27) уы) Xal- „. 03) 
Equation (12) enables the lengths of the axes to be controlled by an appropriate 
choice of the k;. ^ 
The value of the conditional probability for fied sample sizes m; correspond- 
ing to (11) is 


r 2 g2z2 X —1Uoi-1) t " ^ 
Pr (fi (1+ z ax) > exp|=#(14(6/20E Uru) Ж | |. @9 
where the Ё; are independent standardised normal variables, and is decreasing in 
thec;. (The effect of successively increasing the 0; is to produce a decreasing sequence 
of ovaloids.) It then follows from the results of a previous paper (Ruben, 1962) that 
the present estimation procedure is conservative in character, and the true confidence 
coefficient exceeds 1—4. : 
To investigate the efficiency of the procedure, 
confidence regions #ог_ with confidence coefficient 1 
regions in the space of (JA -.., №), 
X nu my lot < Xia ev (15) 
i = 1, 2, ..., 7) as the lengths of the 


note that when the c; are known 
— are the random ellipsoidal 


where the sample sizes т; are fixed. To obtain Т, ( 
semi-axes, the т; must be chosen so that 


n; = (хао), е9 (16) 
the total number of items sampled being then Е 
n= 4 = а z ožji. T n2) 


On the other hand, when the 0% are unknown the average total sample size necessary 


to meet the same requirements is 


x r т 7 
En = E Bn, S ot = (148/27) зш ж. 202. (18) 
1 1 
The efficiency of the two-stage procedure is then 
(19) 


E -q (8/202 lw) 


; id 
* Bllipsoidal confidence regions would cbe obtained if the variate DE (2;— ui)? Wes ү: м ihe 
starting point rather than у. Similarly, confidence regions of various shapes can bo obtained by taking 
appropriate functions of the 2;— pi- 
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We note that the efficiency is (approximately) a function only of the harmonic 
average of the preliminary sample degrees of freedom. 


CONCLUDING REMARKS 


We conclude the present series of papers on the sampling scheme S(7p, ..., 
Noy; Ky, ..., kp) with some further remarks. These are essentially of the same charac- 
ter as those of a previous paper (Ruben, 1961, Section 6) and will therefore be referred 
to quite briefly. The remarks of the earlier paper related to the case of equal vari- 
abilities for which с is scalar, while here, where the variabilities are totally unknown, 
c is à vector, but the validity of the remarks is not affected by this difference and 
the discussion carries over with suitable and obvious modifications. 


(i) It follows from equ. (3.12) of the first paper of the present series (Ruben, 
1962), that the test procedures discussed in the series remain valid and are furthermore 
conservative even if а has some prior distribution, provided Р(Ё| и, с), the probability 
that the vector of the sample means falls in the critical region R of the test for given ш 
and с, is monotone in each of the о;. Similarly, the confidence estimation procedures 
remain valid and are conservative even if р and o have a joint prior distribution, 
provided P(R|c)! is decreasing in each of the o;. (R is here a region in the space of 
the z; which is used to derive a confidence set for ш). 


(ii) For purposes of inference all probabilities have been evaluated in this 
and previous articles as limiting probabilities, і.е. we have assumed infinitely high 
с; for the sake of safety. In practice, however, safe upper bounds would generally 
be available for the o;, and use of these upper bounds in P(R|, с) will improve on 
the confidence probabilities or risks of errors, etc. 


(ii) Stein's two-stage sampling procedure S(n, k) for the mean of a single 
normal population with unknown variance has been criticised (for references, sce 
Ruben, 1961) on the grounds that information relevant to an assessment of the vari- 
ance is thrown away by the failure to utilise the observations in the second-stage 
sample for this purpose. The criticism obviously carries over to the present procedure 
B(ng; ..., ig; Ё, ..., kp) for the means of several populations. Nevertheless, it may 
to some extent be met by incorporating estimates of the о’, obtained from the totality 
of observations, in the probabilities P(R |, с).2 This will have the effect of increas- 
ing the efficiency of the procedure. 
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258 


THE NON-EXISTENCE OF SOME PARTIALLY BALANCED 
INCOMPLETE BLOCK DESIGNS WITH LATIN 
SQUARE TYPE ASSOCIATION SCHEME 


By 8. S. SHRIKHANDE and N. C. JAIN 
Banaras Hindu, University 
SUMMARY. A method for proving the impossibility of Symmetrical Partially Balanced 


Incomplete Block (PBIB) design with Li association scheme is obtained. The technique employed is that 
used by Ogawa (1959) in proving the nonexistence of some PBIB designs of the triangular type. 


1, INTRODUCTION AND SOME PRELIMINARY RESULTS 


Two symmetric and nonsingular matrices A and В of the same order т with 
re exists a rational non- 


rational elements are said to be rationally congruent, if the 
singular matrix C of the same order such that 


С'АС = B 
We denote this relation by the notation 
(1.2) 


(1.1) 


where C" is the transpose of С (Jones, 1950). 
A^ B. 
It is obvious that this relation satisfies the usual properties of an equivalence relation. 


Denote the n leading principal minor determinants of A by Dı, Dz, Desa] 
invariant of A is given by 


and let D, — 1, then the Hasse-Minkowski p 


суду = C1 1, @ Day 00 (3) 


for each prime p, where (а, 5), denotes {һе extended Hilbert norm residue symbol 


(Jones, 1950; Hensel, 1915) defined by ^ 
1 if a+b = 1 has & p-adie golution 


a, b), = 
wet Re otherwise. 


Then from Hasse (1923) and Jones (1950) we have 

Hasse's Theorem: The necessary and sufficient conditions for two positive 
definite, rational and symmetric matrices A and B of the same order to be rationally con- 
gruent are that the square-free parts of the determinants of both the matrices are the same 
and further the Hasse-Minkowski p-invariants of both the matrices are equal for all 


primes p including Pa: 


(14). 
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We state without proof certain lemmas given by Ogawa (1959). 
Lemma 1: Jf A, B and О are rational nonsingular and symmetric and if 


726) 


мо. 0 
V=( о B o 
оос 
then ` e(U) = (—1, —1) (IA]; 180), (А) ¢,(B). 


eV) = (141, |Bl)p (14|, JD» (181, 19), е4) ¢,(B)e,(C). 
Lemma 2: For an nxn diagonal matrix A,, with each diagonal element d 


nn-+1) 
eA) =(—1, —1),(—1,@), * 


n(n-+1) 
Lemma 3: c (pA) = (—1, p), Е (р, |A|)p-*¢,(A) where т is the order of A. 


Lemma 4: If the n—1 rational column vectors 
az, а; ..., Ap 
of dimensionality n are linearly independent and are orthogonal to 
1) 


(1) (a; ... а,) 
u=(: 
а 


¢,(U) = (—1, —1),. 


then for the Gramian of the set, i.e. 


Lemma 5: So long as we restrict ourselves to rational vectors, the p-invariant 
of the Gramian of the set is uniquely determined by the linear subspace generated by the set. 

Lemma 6: If 

А = еї „+9, 
where Т, is the unit matriz of order n and б, is the square matrix of order n with all elements 
unity, then 
|41 = ge 
where g = e+nf. 
: If A and В are square matrices of order n, and X is a square matrix of order 
ns of the form : 
ue How B 


FAR EDT 
then |X| = |A4—B|*314-L(5—1) В|. 
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We quote for completeness the properties of Hilbert symbol and some of the 
useful properties of the Legendre symbol (a/p) where p is a prime (Ogawa, 1959). 
Lemma 7: 
(a, 5), = (b, a). 
(а, би?) = (а, 5), 
if t and u are any rational numbers. We use the notation аз ~аз to indicate that a, and. 
а» differ only by a factor which is the square of a rational number. Hence àn the caleula- 
tion of the Hilbert symbol the square part of any rational number can be replaced by 1. 
(а, —a), = 1. 
(a, а), = (—1, a)». 
(а, biba) = (а, 6), (а, ba) 
(a, 0), = (—ab, a+b), 
As a special саве of the above we have for any positive integer n 
(n, n4-1), = (—1, n4-1),. 
For any odd prime p (а, p), = (а[р) 
and if integers a and Б are not divisible by р, then 
(a, 0), = 1. 
For the Legendre symbol we haye 
(alp) = (01р) if а = b(mod p), 
(ab[p) = (alp) (6р), 


(p—1) (0—1 
(рар) = (—1) * * 
2—1 Pol 
(—1/р) =(= 2 › (2/0) = (71) 8 
where p and g denote odd primes. - 
Let M be a positive definite, rational, symmetric and stochastic matrix of order 
v. Let ру be the sum of the elements in any row. Then py is an eigen-value of M with 
multiplicity nọ = 1. Suppose further that М has two other distinct and rational 
eigen-values p, and р, with multiplicities n, and m, where тот + № =v. Let X; 
be a matrix of order (v, m) such that its columns generate the eigen-space of M 
corresponding to p;; = 0, 1, 2. We can assume without loss of generality that 
X, X, = Ini ШЕТ) 
X, X; = Onin; o Ch] 
nsists of rational numbers divided by the square-root 
he column number. In particular 


where any column of X; co à 
of the same positive number which depends upon t 


X, is column vector with each component 1/4/v. Then it is easy to show that 
Mes p (1.7) 
o : : 
where A, = (aj) = X; DON ad .. (1.8) 
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is a rational symmetric matrix of order v. In particular A, is a matrix with all ele- 
ments 1/0. Further from (1.5) and (1.6) 


A? =A; 2. — (1:9) 

А.А, = 0,,. Be (1.10) 
From (1.7), (1.9), (1.10) it follows that 

МА; — p; A; E (12) 


which means that А; is the eigen-space corresponding to p;. Without loss of generality 
assume that 


2 
а, ai, aj. ... a, ‚ aj, aj... а, 
are independent column vectors and put 


8 = ( a, ai ... a), a} ... dy, ) = (B, B, Ba). е (1.199 


Then S is a non-singular matrix with rational elements. Put 


Q:= BB, i= 0,1, 2. 2," (1.18) 
Pol 0 0 
Then 8 М8 = 0 PQ 0 ... (1.14) 
0 0 20 
po? 0 0 
or M~ 0 PQ 0 й. (1.15) 
0 0 prs 
1 0 0 
Since 8'8 = 0 Q, 9 
0 0 9. 
We obviously have 
10.161 ~, 807) 


Also, the columns of (В, В») are orthogonal to the vector 1. Hence from Lemmas 
1 and 4 and the fact that the Gramian of (B, B;) is 


Q 0 
Е) 
о Q 
¢,(Q) = (—1, = (161, 1931), ¢,(Q1) ¢,(Q2) = (—1, —1), 
or (16:1, 161) с»(@;) (Q) = 1. абая) 
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From (1.15) to (1.17) using Lemmas 1 and 3 and the properties of the norm-residue 
symbols we have 


сИ) = (= —1 (o; —upnipna), (Po v)p (P Pa |0, | i 


nina таби-Е1) та(эз-Е1). 
(гь рә)» (=LA), * (—1, рз), А 18) 


Hence we can state the following theorem. 


Theorem 1: Zf M is а positive definite, rational, symmetric and stochastic 
matrix of order v with rational eigen-values po, py, pa with multiplicities то = 1), ni, n 
and Q, is the Gramian of the rational vectors generating the eigen-space corresponding to 
py, then ¢,(M) is given by the formula (1.18). 

L; association scheme is defined by Bose and Shimamoto (1952) as follows : 
The number of treatments is n? where n is a positive integer. We assume that i—2 
mutually orthogonal Latin Squares (MOLS) of order n exist (Bose, 1938). We arrange 
the n? treatments arbitrarily in a nx n square Г. Let each of ће i—2 MOLS Г, be 
superimposed оп №. For each treatment 0 in L, define its 1-associates as those treat- 
ments which lie in the same row or column of L as 0 or those which correspond to the 
same symbol of Lj as 0 when L; is superimposed оп L;j = 1, 2,...,1—2. The remain- 
ing treatments are called 2-associates. In this scheme each treatment has ж 
i-associates $ = 1, 2, where 

т =i(n—1), п. = (n4-1—i) (n—1). 2. (119) 
The parameters of the second kind are given by 
n—2)+(i—1)(i—2) (@—1)(®++1—) 
( (n—2)4-6—1)(—2) (1.20) 
(n—i)(n--1—1) t 


20 п) ) 
а ( @—9)++(п—й%—1—1) 


A symmetrical PBIB design with L; scheme is an arrangem 
treatments in v block of size r « v satisfying the following conditions : 
(i) Each treatment occurs in exactly r blocks. 


(ii) No treatment occurs more than once in any block. 
as defined above occur 


Р, = 


ent of v = n? 


(üi) Any two treatments which are i-associates 
together in exactly A; blocks, i = 1, 2. 


Consider a matrix with i rows and л? columns in n distinct symbols having 


. ja 
the property that in any two rows every ordered pair (5) occurs exactly once. Such 


a scheme is called an orthogonal array of strength 2, index unity and D constraints 
(Rao, 1947) and is denoted by (n3, i, т, 2). We state an alternative definition of 
L; scheme as a lemma. 
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Lemma 8: Suppose an orthogonal array (m^, ù, n, 2) exists. Identify the 
columns with n treatments in any arbitrary manner. Define 1-associates of any treatment 
0 as those treatments for which the corresponding columns coincide with the column cor- 
responding to 0 in exactly one position. Define the remaining treatments as 2-associates 
of 0. Then the association scheme thus defined is of L; type. 


The proof of this lemma follows from the equivalence of the existence of i—2 
MOLS of order n and the orthogonal array (л, $, п, 2) as shown by Rao (1947). 


2. MAIN RESULT 


Suppose a symmetrical PBIB design with L; scheme for v — n? treatments 
with parameters r, Ау, A, exists. Define the incidence matrix N = (n) in the usual 
manner. - 


{ 1, if treatment $ occurs in block j, 
ny = 
0 otherwise. 

Then, M=NN' Из (230) 


is a nonnegative rational, symmetric and stochastic matrix with eigen-values (Connor 
and Clatworthy, 1954). 


por 
py = r4 (n—i) 4,—(n4-1—3)As ... (2.2) 
ра = 1-10, +(t—1)A, 

with multiplicities по = 1, m and ng, 


where n; and n, are given by (1.19). We consider the case when р, and p, are both 
positive in which case М satisfies all the conditions of Theorem 1. Then from 

| |М| = r pm рт = |N]? el (2.3) 
it follows that when n is even then p, must be a perfect square if i is odd and р. must 
be a perfect square if i is even. 


We use a method similar to that of Corsten (1960) to obtain the eigen-space 
corresponding to the value р, of М. Let (n?, i, т, 2) be an orthogonal array in n 
integers 1, 2, ..., n. Number the columns from 1 to л? and identify the treatments 
with the columns. Let с; be a column vector with v components formed in the fol- 
lowing manner. Suppose the integer j occurs in the k-th row of the orthogonal array 
in columns numbered Ёд,... bj, then єр; contains 1 in exactly these positions and 0 
elsewhere, k = 1,2, ..., i; j = 1, 2,...,т. From the properties of the orthogonal 
array it is easy to see that for every k 


Xe—1 2. (24) 
ј=1 


and that the set of ni vectors thus formed is of тапк i(n—1)+1. Denote the vector 
space generated by these vectors by B. Then B can also be generated by 1 and п—1 
vectors selected from each of the $ sets C, = (Cy), j = 1, 2,..., n, which form а rational 
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B, be the subspace of B which is orthogonal to 1. Let 2 = (=,), 
| any vector of B,, then 


ion 
= eg Cy t ... (2.5) 
11 
EZ ен = 0. ... (2.6) 


Tat times) is 
: it, +8, | = (227) 
the sum of the coordinates whose positions correspond to 1-asso- 


% 
й 
from Lemma 8 we have 
eos x б, : > (2.8) 
= eee 
= (n—1)X ERES (2.9) 
i 
= = (2.10) 
by, 2 (1m 
а = È бы— ©, 
B, = (n—i)te . Ql) 
у 1t 8, denotes the sum of the coordinates of æ corresponding 
n from the fact that 22 is orthogonal to 1 
2, Sp E, = 0. (2.12) 
у= Ma 
Ya = rte 89 . (213) 


2.11) and (2.12) Ya = Pite x 
у В, is an eigen-vector of М corresponding to ру and Во the г 
1) this implies that the eigen-space corresponding to p, is exactly By. 
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Since В is generated by the rational vectors 1 and n—1 vectors selected from 
each of the $ sets Cr, as also by 1 and а basis of B, which can be taken as rational, 
the two expressions for the Gramian are rationally congruent. Denote by Q, the 
Gramian of B,. Noting that 


1'1- n, 
Ге =n, 
C'i Cy = п, 


сусын = 0 ifjzj, 
е ер = 1 ifkzk. 
We obviously have 


n? nJ'  nJ' es 4) 
nJ nl; Gy x Gar 
т? 0 
eae ~ nJ ба М endis Ga 
nJ Gu ui ^. nl, a 


where J is a column vector with n—1 components all unities. The determinant of 
the matrix on the right hand side can be evaluated by performing elementary 
transformations on the rows and the same transformations on the columns and 
making use of Lemma 6. It is then easily verified that 
IQ] = ті". s. (2.14) 
Substituting in (1.18) and using the properties of the Hilbert symbol, we have 
for any odd prime p 
!om(m-l) nana 1) 
M mina 2 2 in к 
(М) = (Py ра), ~ (—1, 99, (—1, pa), (р: ра, т), + (2.15) 
where лу, ne, p, and p, are given by (1.19) and (2.2). Hereafter we omit to write 
pin the symbol (а, Б), We consider the two cases (i) ¢ odd, (ii) $ even. 


Case (1), i odd : If nis even we already have a necessary condition that р; be 
a perfect square. Then 


: na(na+1) 
сМ) = (—1,р) ? 


The index of the expression on the right hand side is odd if and only if n--1 = 3 
(mod 4), in which case 


(М) = (—1, рз) = —1 
if and only if the square-free part of р, contains a prime == 3 (mod 4). 
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Tf n is odd then 
nim +1) na(na+1) 
cM) = (= р) 2 (=, Pa) 2 (PaPa; n). 


Tt is easy to verify that in this case the above gives 


п—1 
ean = ((—1) * m pipa) 
Case (ii), even : Expression (2.15) reduces to 
mima 1) пз(па- И) 
(М) =(—1, ру) 3 (—1, P) д 


Tf n is even a necessary condition for existence of N is that p, be a perfect square, 
in which case 


ny (Mat 1) 


сМ) = (—1, р) P 
The index in the right hand expression is odd if and only if = 2(mod 4) in which case 


(М) = (—1, ру) 
= —1. : 
if and only if the square-free part of p, contains а prime = 3 (mod 4). 

Noting that M~I, 
we have (М) = c(l) = 1. 

Hence we can state the following two theorems. 

Theorem 2: Necessary conditions for the existence of а symmetrical PBIB 
design for n? treatments with L; association scheme when ру and рз are positive and à їз 
odd are : 

G) ifn is even then pı must be a perfect square and if, further n-+4 = (mod 4), 
then the square-free part of рз does not contain a prime = 3(mod 4); and if ae 
e | 

(ii) m їз odd then (n E m, ppl = 1, where p 18 an odd prime. 

Theorem 3: Necessary condition for the exislence of a symmetrical PBIB 
design for п? treatments with L; association scheme when ру and Pa are positive and à ‘ 
even is that if т is even рз must be a perfect square and if further i = 2 (mod 4), then the 


square-free part of p, does not contain a prime = 3(mod 4). 


i i i) of Theorem 2 
is i ing to note that Theorem 3 is obtained from (i) 0 2, 
козо which implies interchanging 


by replacing 4 by n+ 1—i and interchanging A; and А» 


ру and р». | ч 
We indicate in the following table the nonexistence of certain designs. З 
last column indicates the application of the proper theorem together with value О 


p when necessary. 
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TABLE. IMPOSSIBILITY OF CERTAIN DESIGNS 


n i Lr м Е № тетатКв 
5 3 4 0 1 Theorem 2(Н), p=3 
5 3 4 0 E » » 
7 3 15 5 4 " 2 
7 5 15 4 5 » >в 
7 3 21 10 8 ” ” ” 
Л. 5 21 8 10 т» » y» 
7 3 24 14 10 P "n o» 
7 5 24 10 14 » „ » 

10 5 18 2 4 Theorem 2 (i) 

10 6 18 4 ` 2 Theorem 3 

12 6 45 16 12 Theorem 3 

12 7 45 12 16 Theorem 2(i) 

14 5 40 6 9 Theorem 2 (i) 

14 10 40 9 6 Theorem 3 

14 6 27 6 2 Theorem 3 

14 9 27 2 6 Theorem 2(i) 


Tn conclusion we note that Theorem 1 can be generalised to prove the impos- 
sibility of m associate classes symmetrical PBIB design whenever the eigen-values of 
M = NN’ are rational and positive and it is possible to find the Gramians of the bases 
of m—1 of the m eigen-spaces corresponding to the values ру, ps,..., ри. We also note 
that in case of two associate classes symmetrical PBIB design if p, ~ рь, then one 
need not find the value of |Q,| at all. These points will be considered in a later 
communication. 
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RECOVERY OF INTERBLOCK INFORMATION 


By J. ROY and K. R. SHAH 
Indian Statistical Institute 


SUMMARY. Тһе problem of recovery of inter-block information for a general incomplete block 
design is examined in this paper. "The ratio p of the inter-block variance to the intra-block variance plays 
а key role. Under the so-called Normal cet up, the usual estimator of p is biased ; expressions for the bias 
and variance are derived, Several alternative estimators of p having desirable properties are examined. 
A computational procedure for obtaining the maximum likelihood estimate is given. If a certain type of 
estimator of р is used, the estimators of treatment effects are proved to be unbiased, An expression is 
derived for the increase in the variance of the estimate of a treatment effect due to fluctuations of sampling 
in p. 


1. INTRODUCTION 


Since some incomplete block designs have low efficiency factors, Yates sug- 
gested the use of information available from inter-block comparisons to increase the 
precision of estimates of treatment effects. He called the process recovery of inter- 
block information and showed how this is to be done for а cubic lattice design (1939) 
and a balanced incomplete block (BIB) design (1940). Nair (1944) gave the method 
for partially balanced incomplete block designs and finally Rao (1947, 1956) adopted 
the method for any incomplete block design. 

The process consists in applying the method of weighted least squares to 
intra-block contrasts and inter-block contrasts of observations, for the purpose of esti- 
mating the treatment effects, weights being inversely proportional to the variances 
of these contrasts, The ratio р of the inter-block variance to the intra-block variance 
plays a key role and since this is usually unknown, the ratio of estimates of these 
variances obtained from an analysis of variance of the data is substituted. The 
properties of estimates so obtained have not so far been investigated in detail. Recently, 
Graybill and Weeks (1959) have proved that under the so-called normal model, esti- 
mates of treatment effects so obtained, are unbiased in the case of BIB designs 
and they have also obtained (1961) a minimal set of sufficient statistics. 


The problem of recovery of inter-block information in the case of a general 


incomplete block design is examined in detail in this paper. The usual estimator 
of the variance ratio p obtained from the analysis of variance is found to be biased 
and a simple correction is obtained for this bias. An expression for the variance of 
this estimator is derived. Some alternative estimators of p having desirable properties 


and based on quadratic estimators of inter-block and intra-block variances are also 


proposed. The method of maximum likelihood is shown to give rise to a somewhat 


complicated equation for estimation, and & numerical procedure for solving the equa- 
tion by iteration is presented. an 
Tt is shown further that if a certain type of estimator of the variance ratio 18 


used, the final estimators of treatment effects turn out to be unbiased. It is shown 


that there is an increase in the sampling error of the treatment effects due to the samp- 


ling fluctuation in the variance ratio, and an expression for this increment is derived. 
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Consider an experiment in which v treatments are applied on bk experimental ` 
units or plots, themselves divided into b blocks of k plots each. Only one treatment 
is applied on every plot, the actual allocation being done in the following manner. - 

First we consider a design, that is an arrangement of v symbols (one corres- 
ponding to each treatment) in b rows, each having Ё cells. The arrangement is charac- 
terised by the numbers Mjm j = l, 2,..., 9; i= 1, 2, ..., b; u = 1,2, ..., k where 
туи 1 or 0 according as the j-th symbol (treatment) occurs on the w-th cell of the 
i-th row or not. 

Next, the blocks are numbered 1, 2, ..., b at random and the plots in а block 
are numbered 1, 2, ..., k again at random and independently for different blocks. The 
u-th plot in the i-th block then receives the treatment corresponding to the symbol 
which occurs in the u-th cell of the i-th row of the design. Let y;, denote the yield 
of this plot. Under the assumption that the yield from any plot is the sum of two 
components, one due to the plot and the other due to the treatment, we have 


(Yin) Ad t a) 
tata 2 l a 2 RIS EO rA, 
( в) + (1 ;)ei iiSi, и = и 
сог (Jiu, Yw) = ган (1-5 G ШОУ UE u.s. (LO) 
c SETS 
bi? ifizi. 


В v 
Here 0; is the effect of the j-th treatment X 0; = 0, is the overall mean of plot effects, 
jai 


and оў and c? are respectively the mean squares (of plot effects) within and between 
the blocks. From Section 3 onwards, we shall make the further assumption that 
the joint distribution of the random variables y;,’s is multivariate normal, with 
firs& and second order moments given by (1.1) and (1.2). We shall write 

p = «еф. 2 03) 
The problem is to estimate the parameters 0's and оў and oc. 


k 
Let Х ты = лу, the number of times the j-th treatment occurs on plots 


3 A b - 
in the i-th block. Thus ^j =1 or 0 and X п =k, Ў ny = т. The vxb matrix 
j=l {=1 


М = ((n;;)) is called the incidence matrix. 
We shall denote by Emxn a matrix of the form mx n, each element of which 
is unity. The matrices 
СЕТ ММ ad €, — 1 NN'—7 Е : (LA) 
z Peu ji Ё" El 
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play important roles in the analysis. We shall assume that the matrix С is of rank 


(»—1) : this is equivalent to the assumption that the experimental design is connected. 


A linear function © ашу is said to be a contrast if E a, = 0. A contrast 
iu фи 


is said to belong to blocks, or simply called an inter-block contrast if a; = dj = *** = ip 
holds for alli. A contrast is said to be an intra-block contrast if X аш = 0 holds for all 
i. A linear function X aj,¥;, is said to be normalised if Ў aj, їз 1. "Two linear func- 
tions È ашуы and X pr are said to be orthogonal if X audis = 0. Itis easy to see 
that any inte DUE contrast and any intra-block HEAR are mutually orthogonal. 


The rank of the vector-space generated by all inter-block contrasts is (0—1) and 
of that generated by all intra-block contrasts is b(k— 1). 


Let B; denote the total yield for the i-th block and 7’; that for the j-th treatment 
and let G be the grand total, so that 


B; = ELT Т; = n уати and G= z Viu: esr (15) 


We shall use the row-vectors В = (В+, Bs, ..., Bj, and T = (7, о ДАН 
adjusted yields for the treatments are defined as 


Q= T— ВМ. ... (1.6) 


Let, further rs 1 BN Ew a (1л) 


It can be seen that the elements of Q are intra-block contrasts and those of О, are 
inter-block contrasts. 


1947) that minimum variance unbiased 


It is known (see, for example, Rao, Р 
Өз, --- Oo) based on intra-block 


linear estimates of the treatment effects 9 = (A, 
contrasts only, are obtained from the equations 
0С = О. 
We shall write 6* for the solution of these equations. If the ratio p = 110% ів known, 
both intra-block and inter-block contrasts can be used together, and minimum vari- 
ance linear unbiased estimates in this case are obtained from the equation 
4 


(с +20) E 94,9. 


(1.8) 


(1.9) 


ted by (p). When p is not known, an 


The solution of these equations will be deno 
taken as an estimate for 9. 


estimate p* for p is substituted in (1.9) and 6(p*) is 
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For estimating p, the following procedure is generally recommended. (See 
Yates; 1939, 1940 or, for a general treatment, Rao, 1947). First the following table 
of analysis of variance is prepared : 
TABLE ANALYSIS OF VARIANCE 


source м d.f. 8.8. 
blocks (unadjusted) 5—1 55% = r BB’ —@2/bk 
treatments (adjusted) v—1 SS, = 00* 
error eg —bk —b—v--1 SSg = SSp—SS§—SSip 
Л ил з ООЛ ООВ ОНР 
total bk—1 851 = Е уйи 92106 
te 


The adjusted sum of squares due to blocks is then computed as 


88, = 883+88,,— Е тт’ вы). ... (110) 
Then sj and 8% defined by 
88 = SSples v(r—1)s] = kSS,—(v—k)s2 E. (IN 


provide unbiased estimators of с? and of respectively; and as an estimate of р one 
takes 4 
В = 81188. ... (1.12) | 
Tf the blocks are formed so as to achieve homogeneity within blocks, we expect 
p > 1; but depending on fluctuations of sampling, R may not satisfy this inequality. 


For this reason, a modified estimate R’ (which we shall call the truncated form of R) | 
given by 
"NA (1 f R<1 Gan 
= i вн R21 D . 


has at times been recommended. 


2. CANONICAL REDUCTION 

The assumption that the rank of € is (0—1) implies that there is exactly one 
latent root of the matrix NN’ which is equal to rk, and all other latent roots are strictly 
smaller than rk. Let £, s = 1, 2, ..., q be a set of orthonormal latent vectors of NN’, | 
corresponding to the 4 positive latent roots ¢,, allsmaller than rk. Let čes s = q+., | 
9-2, ...,0—1 be a set of (v—1)—g orthonormal 1хо vectors, each orthogonal to 
fois. Ё, and also to E. We then define (v—1) intra-block contrasts zy; 
8 = 1,2,..., 9—1 as follows : 


| k(rk—ó,3Q&; for s—1,2,..,q 
d = 


riok; for s— q4-1, q4-2, ..., v—1. 
Since the rank of the vector-space generated by all intra-block contrasts is (0—1), 
we can find é, = b(k—1)—(v—1) mutually orthogonal normalised intra-block contrasts, 
call them Zos, з = 1, 2, ..., ey, each orthogonal to у, 20 


(2.1) 


deg 
272 


RECOVERY OF INTERBLOCK INFORMATION 


Next, we define q inter-block contrasts 
2, = (kb) 3BN'E; в = 1,9,...‚0. a (2:2) 


Since the rank of the vector-space generated by inter-block contrasts is (b— 1), we can 
find e, = (b—1)—q mutually orthogonal normalised inter-block contrasts; call them 
2155 8 = 1,2,...,€,, each orthogonal to £in 21, .- tig. Finally, let 


G* = (0039. КЕ ; s. (233) 


By straightforward algebra, опе сап easily verify the following results. 
2.1. The linear transformation from yis to G*, tals = 1, 2,...,0—1), 
(в = 1, 2, ..., Ds Zo (8 = 1, 2, бу) and 25, (8 — 1; 2; ..., 61) is normalised orthogonal. 
2.9. The transformed variables are all mutually uncorrelated and their 
expectations and variances ате: 


(r—9,]k)) for 8= 1, 2,...,9 
Exo.) = ат, where do, = Оо 
rt for 8—q4-1,q4-2, ..., v—1, 
Е(х,) = ayr, where a, = (p/h) for 8=1,2,..., 9, ... (2.5) 
where 7, = 08; в=1,2,..‚ 0—1. .. (2.0) 
Elea) = 0, for s—1,2,.5,€9 —H()—0 Югв=1,2,..‚% 
E(G*) = (biu Sn Cbr) 
(хь) = 0%, fo s—1,2,.,—1 И@„)=0ф fors—lL2,.56 + (2.8) 
V(z,)-—02, for s=1,2,...59, Ves) = 03, for 8 = 1, 2,...,6 
7(@°) = 0. es (2.9) 
2.3. The equations (1.8) are equivalent to т, = 1, where 
t, = 2, for s=1,2,...,0—1. ». (2.10) 


2.4. The equations (1.9) are equivalent to T, = 1(р) where 
c (оаылы-Е@ь®ы)|(раф +а» X for s = 1, 2, 39 \ (2.11) 
pu 20100 for в = q--1, q4-2, ..., 9—1. 


2.5. The error sum of squares in the Table may be expressed as 


88, = È 1, = 8, sey. 2. (212) 
ia 


2.6. The adjusted sum of squares due to blocks defined by (1.10) may be 


expressed as | 
88, = в + E ват ... (218) 
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£i = 
where S, = Xd, 2. (214) 
sel 
and Zs = Uqyp—Aqs%y,[4,, for 8=1,2, ..., 4. ... (2.15) 
2.7. Let 
Wiulð) = Yiu T тыб; .. (2.16) 
B(0) = z W;,(8). ... (2.17) 
Тһеп SAB (кат) = iE вц) ... (2.18) 
8=1 
апа SES (Eos — ат, = = vi (6). 5 Вб) 


= Ыр 9—1 g DEO. (219), 


2.8. If the joint distribution of y;,'s is in addition multivariate normal, 
a minimal set of sufficient statistics for the parameters Ө, 08, and o? is provided by 
Zo, (8 = 1,..., 0—1), ту, (8 = 1, 2,..., q), So and 5,. TE p is given, £(p) (s = 1,2, ..., 
v—1) and V are complete sufficient, where 


8, 22 
V=S,+— t3 xd MALLES А. (2.20) 
“ср + 1-Ераф,/аў, 024 


2.9. When р is known Ї,(р) as defined by (2.11) is the unbiased minimum 
variance estimator of 7, and its variance is given by 


pal (pas,--ai,) for s — 1, 2, "99 
У t(p) = 
сё/а8, for s=q+l,..., 0—1. 


(2.21) 


3. MAXIMUM LIKELIHOOD ESTIMATES 


Under the assumption that the joint distribution of the random variables 
Y's is multivariate normal with first and second order moments given by (1.1) and 
(1.2) it follows that the likelihood function L is given by 


log, L — const— | @—1) log, o--b(k— 1) log,o34- a {2 У (tis — aiT)? +S } 


+ {® = (Eos —tosTs)?+ So |] es (Sek) 


where 5, and S, are defined by (2.12) and (2.14) respectively. In all subsequent sec- 
tions of this paper, we shall assume the joint distribution to be multivariate normal. 
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The likelihood equations, obtained by equating to zero the partial derivatives 
of log, L with respect to the parameters, turn out to be 
T,=, (p) for s= 1, 2,...,0—1 (8:9) 
where #,(p) is defined by (2.11) and 


БЕ) = SE (ctr ae (83) 
0—10} = S+Ë (ат) 084) 


The diagonal elements of the information matrix are 


| aog taora for 8= 1,2,..0 
L(t; т) = 
О E о = q41,g4-2,..,9—1 .. (3.5) 
1(03, оў) = Ш * ... (8.6) 
1(0?, 0) = 30—1)o1* (3.7) 
and all non-diagonal elements vanish. 
We thus see that the maximum likelihood estimate of т, is т, = = (p) where 


p is the maximum likelihood estimate of р. To compute р we note that it can be 


expressed as 
3 b(k— NIS+ X utu 
prs uU ped ыш ые (3.8) 
(b— n| 6+ 2 55 (rg tu ] 


We therefore use an iterative procedure. "'tarting with some suitable approximation 


for 7, we obtain a first approximation for р using (3.8). This value of р ів used to 


which, in turn, when used in (3.8) provides a second 


obtain improved values for Ty 
ure is continued till one gets stable 


better approximation for p. This iterative proced 


values for 7/8 and p. 
In aetual computation, we do n 
variables, but make use of the result 2.7 in 


> вне") | 
6—1) È n -:m 097,1" 5 [600 p вө") | 


ot work with the transformed canonical 
Section 2. The iteration formula then is 


(3.9) 


(n — 


where 0") = [0(9, 09, ..., O] is ne n-th ДУ Ача for 0, obtained by solving 
the equations 
e(c + с!) = Q+ E Q.. .. (8.10) 


E -) 


As a first approximation for Ө we may take its intra-block estimate. 
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The asymptotic variance of р obtained from the information maírix is: 
VER S RE 
VO) = ky Ee (311) 


The right hand side of (3.11) serves as a lower bound for the variance of any unbiased 
estimator of р. 


4, QUADRATIC ESTIMATORS 08 AND 0? 


Since the maximum likelihood estimates are somewhat difficult to compute, 
we may restrict ourselves to quadratic estimators for c? and cj. We notice that 
the transformed variables zo, (s = 1, 2, ..., ej), 21,(8 = 1, 2, ..., е) and z(s = 1, 2,..., q) 
defined by (2.15) each have expectation zero and they are mutually uncorrelated. 
The variances of 2, and 2,’s are given by (2.8) and (2.9) and the variance 
of z, is 

V(a) = o$--e,01 2. (41) 
where с, = ад,[а?, = (rk—$,)|$.- © (4.2) 


“Obviously, we need consider only quadratic forms of the diagonal type 
Q= US HS аа ... (4.3) 
з= 


where бу, b, and а, (s = 1, 2, ..., 4) are the coefficients to be determined. The ехрес- 
tation of Q is 


E(Q) — ( botot а, oit (biat в, ) c. ... (4.4) 


The variance of Q, under the assumption that the y;,’s follow a joint normal 
distribution, is 


VQ) = 2 | (0802) +2p аре ( Ме + Зак )] of... «9 


Tt is therefore possible to choose бу, b, and а, (s = 1, 2, ..., 9) so as to make О an unbiased 
estimator of o§(or of o1) with a variance which is minimum for a given value of p = 03/08. 
This gives, for estimating о? 


b, = (6+В;)/А, b, =—B,/A ... (4.6) 
and for estimating oj 

bo =—A,/A, b, = (Cy +Ay)/A el (4.7) 
and in either case 

a, = (5s--p*b,e,)Q1--pe,)* сав 
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where 
1 
A, = (+, A= P e (Lege) 2. (49) 
By = È often), B, = P аро)" 
s= 8=1 
and А = («А+ В) Bad. 


The case where p is large is of special interest. In such a case the term involv- 
ing p? in V(Q) would be dominant, and we may like to minimise this term. The 
optimum unbiased estimates of тб and 0? in this sense are given by 


0% = б — (410) 
р) je 2 
ИЕ b Lits [8+ it ] es (411) 


respectively. In aetual computation we make use of the fact that 
a2 1 св 
^ — X В0*)— ... (4.12 
= "E 10") т (4.12) 


where B,(0) is defined by (2.17) and 0* is the intra-block estimate of Ө obtained from 
(1.8). ТЫв estimate was suggested by Shah (1962) from intuitive considerations. 


The variance of v, is 


O E о m 


where == X ( -% ): vee (4.14) 


8=1 


We may compare this with the customary estimate 8? of 0? as defined by - 


(1.11). The variance of this estimator is 


Vet) = слу (ert ane Bera) 


+1 ( 1 ) fs]. ... (418) 


5. UNBIASED ESTIMATORS OF p 


As a convenient unbiased estimator of p we may consider а statistic of the form 
(5.1) 


S a8,--X bs +e А 
Sp у 
‚ q) and с are constants to be suitably determined. Since for 


T 


where a, 6,(8 = 1, 2, ... 
бф > 2 - 
b,(1+ pes 
Р) = паа IAEA. +0, 
6—2. 
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to make P an unbiased estimator of p, we must have 


Zhe е0, 
and ae,+=b,c, = &—2. ... (5.2) 
If в, > 4 the variance of such an unbiased estimator turns out to be 
ИР) = A, + Ар- Asp", ... (5.8) 
2 
where Ay = 8 E UE} х0, ( zh ) 
0 


_ 62 bo, +2ae, b,c, 22, 
(e9—2)(€9— 4) 6—2 


А, 
_ еце -3)+-35 008 у 
Se тоет нии 


If we like to minimise A, the coefficient of р? in (5.3), we have to take 


3(6—2) p (ata. ... (6.4) 
sateet? ' Зс, 


It can be seen that R given by (1.12) is not an unbiased estimator of р, but 
a simple correction can be applied to it to make it unbiased. We thus get 


йл е 840—0 3 
(1 2) 8 CD ... (5.5) 


as an unbiased estimator of p. 


ý Similarly, if we start with v, and v, defined by (4.10) and (4.11) as estimators 
of оў and o respectively, we get, as another unbiased estimator of p : 


(-2)2- 283) (р). = eo 


where Е is the efficiency-factor of the design (Kempthorne, 1956; Roy, 1958). 


v—1 v—1— 1 1 

к-С x ll rk : E a азе 

Since with positive probability these unbiased estimators of р may turn out 

to be less than unity, we may use their truncated forms instead, as indicated by (1.13). 

Let x be any unbiased estimator of p and x’ its truncated form defined ут =1 if 

æ <1, and 2' = v, otherwise. Then, even though 2’ is generally a biased estimator 
for p, it can be easily seen that its mean square error can never exceed that for x, 


Ер) < Бер) 
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6. SOME PROPERTIES OF COMBINED INTRA AND INTER-BLOCK 


ESTIMATORS OF TREATMENT EFFECTS 


As a corollary to result 2.8, we conclude that when p is given, unbiased 
estimators of treatment effects with minimum variance are obtained from the 
equations т, = f, (р), в = 1, 2, ..., v—1, where the right-hand side is given by (2.11). 
In this section we shall investigate the properties of the estimators of treatment 
effects obtained by substituting some estimator p* for pin the above expression. For 
typographical simplicity, we shall write 


= (р) 8 =,(p*). 


e Ede, 
We then have the following : 
Lemma 6.1: Jf p* satisfies the conditions 


Leva pa ee, di $62). 


E(w,) = 0, V(w,) «co .. (6.2) 
for all values of p, then ЕЕ) = Ts x 
and ИЕ.) = Vira V(w,). jn) (6.8) 
To prove this, we note that 
Uem ... (6.4) 


ВБ №. 
" Hr ал(1--ре,) я 
i, is the unbiased minimum variance estimator of and by the 


Also, when p is given, 
function. By Stein’s theorem (1950) 1, and w, 


conditions of the lemma w, is a zero 
are uncorrelated. Hence the lemma. 


Let P be any statistic of the form (5.1) and let p* be defined as 


if P21 
р* = ү 3 ў ... (6.5) 


1 otherwise. ў 
Tt can then be shown that р* во defined satisfies con itions (6.2). That (и) is finite 
can be easily checked. То show that E(w,) = 0, we note that p* is an even function 
of ду, (8=1, 2, ..., ё), Me (8 = b 25 o еу) and 2, (в = L 2, q) and consequently, 
w, is an odd function of these variables. Since the z’s are mutually independent 
random variables each having & normal distribution with mean zero;the result follows. 
This is merely an extension of Graybill and Weeks (1959) argument for balanced 
designs to the case of general incom; lete block designs. A similar argument gives 

E(w) = 0 


for s #s' = 1,2,..,q. Since f, and îy are independent, it follows that 7 and 7, are 


uncorrelated for s 52 8' = 1, 2, ..., 0—1. 
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Now, any treatment contrast т can be expressed аз т = X lT, where 
lj (s = 1, 2, ..., v—1) are some constants. The minimum variance unbiased estimator of 
T when p is known is = Xl, When р is not known, for a combined inter and 
intra-block estimator of т, one takes ?* => lī; by substituting a suitable estimator 
р* for p. If p* satisfies the conditions of Lemma 6.1, we get the following : 


Theorem 6.1. The estimator Ї* is unbiased for т, and its variance is given by 


ўжү — LH V(w,) 
7 = ТО ym 


the second term being the additional variance due to the sampling fluctuation in p*. 
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AN ESTIMATE OF INTER-GROUP VARIANCE 
IN ONE AND TWO-WAY DESIGNS 
By K. R. SHAH 
Indian Statistical Institute 


SUMMARY. A procedure for estimation of inter-block variance has been suggested as an alter- 
native to the usual procedure given by Yates and others. A computational procedure to obtain this esti- 
mato is also given. The variance of this estimate is compared with that of the usual estimate when p, the 
ratio of the inter-block variance to the intra-block variance is large, say р 2 Po, the estimate considered 
hore has a smaller sampling variance. A table showing these values of po is given for all BIB designs listed. 
by Fisher and Yates, Similar procedures for estimating inter-row and inter-column variances are given 
for designs where the experimental materialis symbolically arranged in rows and columns to eliminate 
heterogeneity in two directions. 


1. INTRODUCTION 


Yates (1939, 1940) suggested the use of inter-block contrasts for estimating 
treatment effects. In order to combine these estimates with those obtained from 
intra-block contrasts, one needs knowledge or at least the estimates of variances of 
both inter and intra-block contrasts. The variance of a normalised intra-block 
contrast which we shall call intra-block уа iance is estimated by the error mean 
square in the ordinary analysis of variance. The variance of a normalised inter- 
block contrast which we shall call the inter-block variance was estimated by Yates 
(1939, 1940), Nair (1944) and Rao (1947, 1956) using the block sum of squares 
adjusted for treatments. 


In the case of two-way designs where the experimental material is symboli- 
cally arranged in rows and columns, for the same purpose of utilising inter-row and 
inter-column contrasts for estimating treatment effects, Roy and Shah (1961) gave 
procedures for estimating inter-row and inter-column variances. The estimates 
considered there are analogous to the estimate of inter-block variance given by Yates. 


In this paper an alternative estimation procedure is put forth for the estima- 
he experimental material is grouped in blocks 


tion of inter-group variance, where t £ 
as in one-way designs or in rows and columns as in two-way designs. In the case 


of two-way designs this procedure turns out to be computationally simpler. 


2, PRELIMINARIES 


Consider an experiment in which v treatments are tested on bk plots, arranged 
у one treatment, and each 


in b blocks of Е plots each, such that each plot receives exact 
treatment is applied atmost once in а block and altogether in r blocks. Let уш denote 
` the yield of the u-th plot in the jth block. It is assumed that 
X Өт, (2.1) 
Уш = ncc Ё Bm t- Eiu 
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where д is the general mean, fj the effect of the i-th block, 6; the effect of the j-th 
treatment, ту, = 1 or 0 according as the u-th plot in the i-th block does or does not 
' receive the j-th treatment and e;, is the experimental error, i = 1, 2, ..., b; j = 1,2, 
.. 2} U=1,2,...,% The general mean ш and the treatment effects 0; are regarded 
as unknown constant parameters, subject to the restriction that X0, = 0. The block 
effects гв and the experimental errors є’ are taken to be independent random 
variables, each with expectation zero and variances given by 


(В) = ob, Увы) — o8. for all (i, u), t Ga 

we shall write оў = оў-+-Коў, р = oiloa. „. (2.3) 

Let Xmj,- пу, the number of times the j-th treatment occurs on plots 

in the i-th block. Thus n; = 1 or 0 and Iny =r 2 ty; =k. The vXb matrix 
N = ((nj)) is called the incidence matrix of the design. 


Denote by B; the total yield of the i-th block, by T}, the total yield for the 
j-th treatment, and by G the total yield of all the plots. Thus 


В, = ЗУ T = Уту, O= У Yu; + (24) 
u i u x i u 


we shall use the row vectors B = (B,, В,,..., В,) and T = (T, , T,, ..., Т,), we 
further define 


uon И eS, DISCI 
Q = T—L BN', Q, = LBN'— TE, P= B-1 TN 
С=пт-1мм,с, = 1мм  E,D-—KkI-lNN —.. (25) 
k ЕВ bk т 


where Emm stands for а matrix with m rows and n columns with each element unity. 
It is easy to verify that 


E(Q)—9C, Е(О,) =0С,, ЕР) = 0, 
V(Q) = €o$, V(Q,) = Cio}, V(P) = Бо Оо, e. (2.6) 
where Ө = (0,05, ...,0,) is the vector of treatment effects parameters. 


As is well known (Rao, 1947) the intra-block equations for estimating 
treatment differences are 
О —90C $e (222) 


giving us 6= QC* as a solution, where we put A* for the pseudo-inverse (Rao, 1955) 


of the matrix A. It is also shown by Rao (1947) that the combined inter and intra- 
block equations are 


0+0, о(с+-с,). e. (28) 
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Since p is usually unknown, to solve these equations one uses an estimate of 
p obtained by taking the ratio of the estimate of о? to that of $.[(Yates, 1939, 1940); 
(Rao, 1947)]. An estimate of 0? is provided by MS, the error mean square in the 
ordinary analysis of variance. We now consider the problem of estimating oj. 


3. ESTIMATES OF of 


Estimate of o? using sum of squares due to blocks adjusted for treatments 
was obtained by Yates (1939, 1940) for special designs and was later adopted by 
Rao (1947) for general incomplete block designs. This can be put in the form 


Le kPD*P'—(v—k)MS, ... (3.1) 
v(r—1) 
1 1 
(B—6N)(I—= EBON) —4M8, (3.2) 
Another estimate is Е» = 6—0 Cert 


where a is the trace of the matrix C*NN’, which can be shown to be equal to k(v—1) 

(1—E)/E where E is the efficiency factor of the design (Kempthorne, 1956; Roy, 1958). 
Under the additional assumption that the random variables fs and єш8 

are jointly normally distributed we shall derive the variances of E, and Ез. Now, we 

state the following lemma used in deriving V(Ei) and Ү(Е;). 

.2,) has multivariate normal distribution. with 


Lemma: If æ = (t1, % .. 
if Ais a matriz such that there exists an ortho- 


mean © and dispersion matriz А and 
gonal matrix P satisfying 
PAP’ = diag (Ay Ag; n As) P ^ P' — diag (à; бу, sey On) 


then E(xAx') = > [2n (3.3) 


(аЛа) = 2 z [2 
The proof is immediate if we transform from æ to Z = аР". 


e observe that IS, is distributed independently of Р, В and 0. We 


First, w 
(В өй) = 103 


also note that В and 6 are independently distributed. Hence И 
+. N'C*NoÀ. 
Using the Lemma we get 


bi 1 
с ы. 


2 VW. IT NC 
VE) = xp [^£ (3 crtis ze t) 
-zero latent roots of D (when all treatment contrasts 
cted, the matrix D has exactly one latent 


and 


where 21, to, ..., 25-1 ATO the non 
are estimable, i.e., when the design is conne 
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root zero. Here, we shall consider only connected designs) and e stands for the 
number of error degrees of freedom in the intra-block analysis. 


Since @ = Xa; /(б—1) = v(r—1)/(b— 1), it is easy to see that 


Y0)- = ту [3 [eret menie] 
(v—k) a7? 1oi 2 
eatem ses 
On simplification this gives 
2 

Ү(Е;)— (Е) = war 8-0-0) х1) 
ert! 1 (9—8) a2 ; 
toto и, ul] ve» (8.0) 


In this expression oj has positive coefficient while сё and ofo} have negative 
coefficients. Hence V(H,)—V(H,) is positive for somewhat large values of 03/03 or 
equivalently for somewhat large values of p — 01/03. By po, we denote the value of 
p such that for p> ро, V(E,) — V(E,) is positive i.e., the usual estimate has larger 
variance than the new estimate. The values of p, are given below in the table for 
all BIB designs listed by Fisher and Yates (1957). For each of these designs ро 
happens to lie between 4 and 5. 


VALUES OF ро FOR ALL BIBD BY FISHER AND YATES 
(Other than Symmetrical Designs) у 


k r b v Po k r b v Po 
Бо С РИ ee 

3 6 10 5 4.3499 5 10 18 9 4.1795 

3 5 10 6 4.9828 5 9 18 10 4.1875 

3 4 12 9 4.4179 5 7 21 15 4.2225 

3 6 26 13 4.6270 5 6 30 аи 1900 

5 10 82 41 4.3435 
3 9 30 10 4.6801 
0822 
3 7 35 15 4.7080 6 x a тё 
X a Е 6 9 15 10 4.1062 
7 19 4.8091 : 5 rd оя 
3 10 70 21 4.8463 6 8 28 21 4,1560 
^ 3 6 9 69 46 4.9142 
м 4.2620 в 10 85 51 4.2406 
4 10 15 6 4.2044 Н E 2. Я 
4 6 15 10 4.2640 * а AEN 
1 36 98 4 
4 8 18 ; 
о ао 1 8 56 49 4.1157 
4 
5 20 16 ^ 4.2584 E 10 45 36 4.1040 
4 8 50 25 — 4.4464 8 9 72 64 4.0976 
4 9 63 28 4.4832 9 10 90 81 4.0672 


ШААНИ. 
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It may be noted that the two estimates are identical in the case of 2; = 23 
= ... = ay іе. when the design is a dual of a BIB design. Hence in particular for 
symmetrical BIB designs the two estimates are identical. ? 


Other properties of E, will be discussed in a subsequent communication. We 
derive here a simple computational procedure to obtain E, after the intra-block 
analysis is performed. 1% is easy to see that 


(B—8N) (1— E (B—0N)' = kS85—BN'6 --6NN'6 „. (8:7) 


where SS} denotes unadjusted block sum of squares. On simplification this reduces 
to iSSt—(2—r6)0]. Once, the intra-block estimates $ are obtained, this is easily 
computed. ': Р 


4. DESIGNS WITH TWO-WAY ELIMINATION OF HETEROGENEITY 


A similar problem for designs with two-way elimination of heterogeneity 
would necessitate estimates of three variances. Here, if the experimental material is ' 
arranged into a two-way array of rows and columns one has to estimate inter-row vari- 
ance, inter-column variance in addition to the usual error variance which one may call 
interaction variance. As usual, the estimate of interaction variance is provided by 
MS, the error mean square in the ordinary analysis of variance. A method of obtain- 
ing estimates of inter-row and inter-column variances was given by Roy and Shah 
(1961). Тһе estimates given there are analogous to the estimate of inter-block variance 
given by Z, and to compute them one has to carry out two additional one-way analysis, 
one with rows and treatments ignoring columns and the other with columns and treat- 
ments ignoring rows. In this case, it would be simpler to compute the estimates of 
inter-row and inter-column variances corresponding to Е, in the one-way case. 


If o2 denotes the variance of а normalised inter-row contrast an estimate of 


0% is given by p 
4, 8-0) (1-} Enn) (R-tMy eain ars, (m 
i 
n(m—1) 


where all the quantities defined in the ВН.5. are as defined by Roy and Shah (1961). 
Using the same notations, the estimate of g2, the variance of а normalised inter-column 


contrast is given by 
(см (1-4 Bm \(0—ENY—inE*NN')MS, 
A$. = х 
Er m(n—1) 
After the interaction analysis is performed 65 and 6? can be easily calculated. 
'The author is grateful to Dr. J. Roy and to Dr. 8. К. Mitra for helpful 


discussions. 


(4.2) 
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By P.K. PATHAK 
Indian Statistical Institute 

SUMMARY. Insimple random sampling with replacement, Basu (1958), and Des Raj and Khamis 
(1958), showed that for estimating the population mean, the average of distinct units is more efficient than 
the overall sample mean. In this paper, a detailed treatment of the above problem is given, and the exact 
expression for the variance of above estimator is derived. ‘The relative efficiency of the above estimator 
with other estimators is also considered. An improved estimator of the population variance is obtained. 
Finally, а comparison between the two simple random sampling schemes (with and without replacement) 
is mado, Ё : 


l. INTRODUOTION 
We index the N population units as 1, 2,..., М, and let Y; be some real- 

valued characteristic (in which we are interested) of the j-th population unit.* Here 
we consider the problem of estimating the population mean 

Y=N=2 У, 
and the population variance о? = N3 X (Y;,—Yy. 
For simplicity we refer population units iby capital letters and sample units by small 
letters, e.g., и; and y; will denote the unit index and the variate value respectively 
associated with the i-th sample unit. 


2. ESTIMATION or Y 

In simple random sampling (with replacement), Basu (1958) considered two. 
estimators of the population mean 

(i) ў = 1/® X y, = average of n sample units; 

(ii) J, = 1/v B-yq == average of v distinct units observed in the sample. 

If we record the г. of observation as ^ — 

= (ш, Vg, ens En) у 

where 2; = (Y; u;); and if v be the number of distinct units ое їп the sample, 
Basu (1958) showed that the ‘order-statistic’ (sample units arranged in ascending | 


order of their unit-indices) 
= [ta (о), +++ 2] 


(where xu) = (уу, шуру), and yq is the variate value of the sample unit with unit index 
t) forms a sufficient statistic, and therefore, for any convex (downwards) loss function 


E(g|T) = Ey, |T) = 9, s. (2.1) 
has uniformly smaller risk than 7. s | 
An exact expression for variance of J, is given below. 


Variance of $». We have 


cil e RA LE 
у) = БФ) = а (1-3) s (22) 
Wee вии. 00 i 
* j runs from 1 to М; i from 1 to n; and (i) from (1) to (v). .— En 
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: т) таратар. PA 
Since E G)- E ‚ (Pathak, 1961) 
VG.) = ецш аба, ` (2.8) 


For large samples, it is rather cumbersome to compute Ү(ў„). An approximate 
expression for V(g,) valid for terms up to order N-? is given by 


Y) = |>- t =: .. (24) 


3. ADMISSIBILITY PROPERTIES OF CERTAIN ESTIMATORS OF T 


Let I' denote a certain class of estimators of Y. For a given loss function, 
_ let R(t) represent the risk (or expected loss) associated with the estimator ¢ of Y: 

Of the two estimators ¢, and t, of Y, 4, will be said to be uniformly better than 
t, if, for a given loss function, 

R(t) < В) es (34) 
holds for all possible values of (Y,, Y5, ..., Y y) with strict sign of inequality holding 
for at least one (Y, Y,..., Yy). 

An estimator ¢ belonging to Г is said to be admissible in Г if there exists no 
estimator in I" which is better than t. 
Now we consider the problem of finding admissible estimators of Y. Аз 
the ‘order-statistic’ T' is sufficient, we have to restrict ourselves to functions of Г 
only. Moreover, the distribution of T' is not complete, therefore, many different esti- 
mators of Y can be suggested. For simplicity, we shall consider the following class 
of unbiased linear estimators of Y. 
= fv) 9,65). ... (3.2) 
In view of the fact that ЕЯ, | v] = AË У), 
obviously, necessary and sufficient conditions for 7, to be an unbiased estimator of 
Y, are 
ЕЛУ) =1 and Effy) = 0. E33) 
Consider now the class Г of estimators g, which satisfy the conditions of (3.3). 


Now va) = E[re( 2 = x)*] EVA) f. | s (34) 


In order to choose a good estimator from Г, we are to minimise (3.4) by proper 
choices of f,(v) and f,(v). The first expression on the right hand side of (3.4) is inde- 
pendent of f,(v); so, for a proper choice of у), we ате to minimise 

VIA f) 
which is minimum if Y f (y)--f,(v) is constant for all values of v, i.e., 


УЛ) = ЛОРА) = У 


лш с МЕ CES 
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Since the above solution of f,(v) contains the unknown Y, the exact value of 
Ду) is not known unless /(у) = 1. Thus, if we choose f,(v) = 1, the best estimator 
of Y would be ӯ, However, in practical situations, when some a priori knowledge 
about Y is available, it seems appropriate to approximate f,(v) by 


Му) = Х[1—/(У)1 ... (3.6) 
where X is.some a priori estimate of У. For example, X may be taken as the estimate 
of the population mean of the same variate obtained from some previous survey 
ete. On the other hand, if no such information about У is available, № would be 
safe to take f,(v) = 0. То choose the optimum value of (у), we have to minimise 

De 
a yea 
Bf) (т) 
subject to the condition that Z[f,(v)] = 1. 


By Schwartz inequality we have 


пло) р em 
The equality holds if and only if 


flv) = (= -y) N [= 11) <] МУКУ)... (3:8) 


Thus, when some а priori estimate X of Y is available, the optimum estimate 
of Y is given by 
ГУ —)] 


3 ININ —v)] NIN) 3. 3.9 
9а = — ERA sex [| oer | e 


When no such information about Y is available we may use the following 
estimator : 
ININ 2 .. (810) 
РИ 
"* HNN») 

The two estimators are admissible in Г in the sense that they minimise the 
first component of (3.4). Any estimator ӯ, different from either of them cannot be 
uniformly better than „(уу or Fr because 

(Фа) <V(g,) for all populations where У = 
Y) < V(g, for all populations where У =0. 

Expression for Е[Му|(М—у]: Proceeding on similar lines given by the 

author (Pathak, 1961), it can be shown that 
p 
E{Ny|(N—v)] = № P Wen: ... (3.11) 
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ЗЕЕ ЕБЕ 
0 otherwise. 

Thus, we see that it may be quite cumbersome to compute the estimators (3.9) 
_ and (3.10) in case of large samples owing to the difficulty of computing Е [Nv/(N — v)]. 


If, however, the sampling fraction n/N can be ignored, the estimators reduce to 


№ ЕЕ у ]. ; 
Ja =EN) %+Х 1-59] ; (3.12) 


5* У = 
Yri) = ЕС) У». (3.13) 


Tt is easy to see that (3.13) is the well-known Horvitz-Thompson (1952) esti- 


mator in case of equal probability sampling. An interesting comparison between 
(ӯ) and V(g,) is made below. 


4. COMPARISON BETWEEN (Vj, AND V(j;o) 
We have shown that 


Vij) = ET. 8°; 


and Yi) = BLY (5909) +7 [2 G7] 


= [xa а Т(у). E (41) 


у М 
It сап be shown that 


к-т 
ш) =®[ї—(1—у) |Ha- [1-2(1-5)'+(1-Z)"]; 
СЕТЕ 


Mo = appl (a) a 


-—(N—1) fi—2(1— аў + us i ) ү | i- ( eT 


СЕЕ КЕЕ 
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Now 14" 24^ 
ТА. cg we NW 1——| —(1—-++ 
V(g,)—V(o) = 8% + ud : 211 ER | 
Enn 
a b-(-y)T 
= 0,92—0,72. (say) PCE) 
Thus Я, is better than gj if 
and worse if © > = 
1 


Approximate values of C, and С, for large populations correct up to terms of order 
N-?, are given by 


203 150—1), 
0,— USAID ; 
_ (@—1)_(%—1)(@—?) 
O = ——, Қа“ ... (4.4) 


The above comparison shows that if the square of the population coefficient of 
variation exceeds (n—1), then jj has smaller variance than g,. Moreover, if we 
have some a priori knowledge of Y, № would be more pertinent to compare ў» and 
а). № can be seen on similar lines that Я, is better than Fro if 


$ 2% 
т < 0,’ 


and worse otherwise. This result shows that if X provides a close approximation 
to Y, it is always better to use Tya rather than Jp. А 

We now state the following admissibility property of ў». 

Theorem 1: If squared error be. the loss function, ӯ, is admissible among all 
functions of F, and э. . 

Proof: Let t= d, fü» У) 
be a function of j, and v. Suppose that t is uniformly better than ў». Now by 
hypothesis, E з 
R(t) = EG. —Y +E, Р-Я, —У) FG» WIS Eg, Y) .. (4.5) 
holds for all Уу, Yo, +++) Yx- Take in particular Y, = Y,=..=Yy=0 (say). 


Then the above relation implies that 
(О, v) = 9. e (4.6) 


Since the choice of C is arbitrary, it follows that fü» у) is identically zero, which 
proves the above theorem. : 
291 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS: Зевез А 


5. ESTIMATION OF VARIANCE 


We now turn to the problem of estimating the population variance from 
a simple random sample (with replacement). The usual estimator of the population 
variance 
o = N13 (Ү,—Ү)? 
is given by the sample variance 


8% І 


УУУ E(y, —9) = 


: б 
E s —› 


In this section, we derive an estimator uniformly better than s?. 


Theorem 2: For any convex (downwards) loss function, an estimator uniformly 
better than s? is given by 


n -| SR] si 2. (5.2) 
$ i 
eue бт) = v-(1) (v—1)y4-... (=) i] r, 
1 5 * 
and . Ag. =p E(yg—-5? iv-1; 
0 otherwise. 


Proof: Since the ‘order-statistic’, T, is sufficient, by Rao-Blackwell theorem, 
an estimator uniformly better than s? is given by 


1 
ЕТ] = Е 3 и) | = Ел)... 63) 


When у = 1, (5.3) is obviously zero. То derive (5.3) when v — 1, we observe that 


X y". 0-2! (ri | 74 


< P[o, = tun % = t| Т] = N Gay! ... ao! 1 NN 


z am! = ! (y ) F (y ) = 


G4 =1,2,...,») 


(5.4) 


where У’ means summation over all integral œ’s such that 


May beaten = ® and у> 0 for i= 1, 2,..., v; 
and =” means summation over all integral a’s such that 
91-948)... Бау = n—2, 94) 20, о, > 0 and ay) 0; for ВА =1, 2,...,v. 
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It follows from Lemma 1 given by the author (Pathak, 1961) that 


, n! = E 
Я ЕТ бш); 

"ace р » 2 
х Е C,(n—2)--20, 4(n—2)4-O, 4т— 9) 

— 04) 0,(n—1) (5.5) 
v(v—1) 
€,(n)—O,(n—1 1 
n Pp, = аир t= mg) T] = EN 0 ... (5.6) 


би has), 
Thus, if v> 1 - 


ENS 
E [ ше iv ]- x Uo E y? Phy =, ® = zan T] 


= 009-0480) 1 зада] 


С, (т) 2v(v—1) 
б О Du d yo а (87 
Ооо У (yot) (5.7) 
E —0,(n—1 
Therefore, ог апу у,  E(s|T)— E | (n us T ]= €, "T ) в Tene (18) 


where 88 has been defined earlier. 


Dun , Os (n—1) 
In practice the estimator s; requires the knowledge of the ratio Ст) 


Table 3 gives values of Eum P correct to seven places of decimals for 1Km<n<50; 
(2 
т' 


and were computed from values of >. tabulated by Gupta (1950). 


The following results are direct consequences of Theorem 2. 


If there are two characters Y and Z, the covariance bet 
is defined by 


ween Y and Z 


| биг) = ү Ж%—#\®%—®). el (5.9) 


The usual estimator of oy, is given by 


Swa E Z(yi— 962. (5.10) 
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Corollary 1:: It follows from Theorem 2 that an estimator better than Sua is 
given by : 


— | E(n)—C.(n—1) i 
=. BE би) 
0 otherwise, 


where Syy 18 the sample covariance based on the distinct units observed in the sample. 

The above theorem can be used to derive an unbiased ratio estimator which is 
better than Hartley-Ross unbiased ratio estimator (1954). In the sampling scheme 
under consideration, Hartley-Ross estimator is given by 


n c ELA рр 1 Wl ч 
от :(2 ғ) a, 


where Z = N= X 7, T= l/n Хуа; and x is the value of the Z-characteristic, an 
auxiliary characteristic related to Y characteristic, of the i-th sample unit. 


In = rZ— 


Corollary 2: An estimator better than ӱр is given by 


=й C,(n)—C,(n—1) ERE ERES MES 
T,Z4- буп) . (v=) (9, —r,z) if v>1 a (5.12) 


A otherwise, 


89.171 = 


where р, р Шз; Zo 
У 2) У 


Murthy (1961) has extended the idea of ratio estimators to product 
estimators. Similar to the well-known ratio estimator -- he has considered the 


product estimator 
(5:13) 


SiS 


for estimating Y. 
Corollary 3: It can be verified that an estimator better than 2 is given by 


ту знана] вы 


Where Sv) із given by (5.11). 
Finally, the almost unbiased product estimator of Murthy (1961), is given 


which makes vs almost unbiased. "This estimator is given by 


pene Hell 1 Уу; 
DX 7 $1) “яй ... (5.15) 


Corollary 4: An estimator better than P, із given by 


ЕР, | T] = Hoto tua, 2. (5.16) 
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6. Some ESTIMATORS OF V(j,) 


Some unbiased estimators of V(g,) are given by 


[nlp 99-1 а ЖОКЕ 


© щщ =] = awa” 


; oy [ Мара, (МТ М  [6O,(»)—O,(n—1)]s , 
(11) 069) = [ N" | (N—1) O(n) a 


(ш) og) ea 


v о) (р) 1-1) ] (to be used for v> 1). 


The estimate (II) is known to be uniformly better than (I) It appears diffi- 
cult to give direct proofs of relative efficiencies of these estimators. The estimators 
(IV) and (V) were given by Des Raj and Khamis (1958). The estimator (V) is condi- 
tionally unbiased for v > 1. Des Raj and Khamis suggested the use of (V) for v > 1. 


It is easy to see that 


n 

= 0, ——_—., 6.1 

va = ®% (М) (6.1) 
A little comparison will, now, show that the conditional variance of (V) is less 

than the variance of (IV). The amount of decrease in the variance is given by 


Vie) Viv > = урул He. Ж 


In general, this leads to the conclusion that any estimator ô? of o? which is 
unbiased for o? and is equal to zero for у = 1, can be reduced to give a conditionally 
unbiased estimate of о? for у> 1 whose conditional variance will be less than the 
variance of 62. This conditionally improved estimator is related with ô? by the 


following equation. 


CEPS [ою] ss (6.3) 


where o2, stands for the conditionally improved estimator of g?. 
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Numerical example. То study the relative efficiency of the estimators of V(i.), 
we consider the following three populations given by Yates and Grundy (1953). 


TABLE 1. THREE POPULATIONS GIVEN BY 
5 YATES AND GRUNDY 


population A ев с 
unit У; Yj Yj 
1 0.5 0.8 0.2 
2 1.2 1.4 0.6 
3 2.1 1.8 0.9 
4 3.2 2.0 0.8 
BY; 7.0 6.0 2.5 


These populations were deliberately chosen by them as being more extreme 
than will be normally encountered in practice. 


The table below gives variances of unbiased estimators of V(j,) when n = 3. 
V(v,) is not given as V(v,) > V(v,). 


TABLE 2. VARIANCES OF UNBIASED ESTIMATORS OF V(Y) 


population Ү(Ү,) V(v2) Vis) V(vs) (|> >1) 
A 0.29823 0.04940 0.05222 0.09017 0.07897 
B 0.06125 0.00220 0.00232 0.00396 0.00348 


e 0.020964 0.000279 0.000293 0.000490 0.000432 


The results show that for the three populations 
ы) < Тб) < Vins |v- |) < И). es (64) 
Thus v, appears to be most efficient estimator of V(j,). 
For п = 2, v, and v, are identical. The comparison thus strongly suggests the use 
of v, for estimating V(i,). 
For getting estimators of V(t), where # is any unbiased estimator of Y, the 
following procedure may be adopted. 
v(t) = &—est (Y2), ... (6.5) 


where est (Y?) stands for an unbiased estimator of Y? and can be obtained from any 
of the relations 


est (F?) = v(g)—9? (= 1, 2, 3, 4, 5). el (6.6) 
From the example considered, it is expected that 
est (72) = o(9,)—g2 (i= 2, 8) © (6.7) 


would fare better than the remaining estimators of Y2. 
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.4920635 
.2990033 
.1857143 
.1071429 
0476190 


0 


15 


4999695 


.3321838 


.9453432 


.1901391 


.1507901 


.1206858 


.0965357 


.0764970 
.0594466 


.0446549 


.0316239 


.0200000 


0095238 


0 


.4960630 
‚3115942 
‚2057613 
.1333333 
.0789474 


0357143 


0 


16 


4999847 
.3325687 
„2465438 
.1922722 
.1538225 
.1245448 
.1011446 
.0817858 
.0053539 
.0511277 
.0386164 
‚0274125 
.0174419 


.0083333 


0 


.4980392 
.3193388 
.2189189 
.1510574 
.1005291 
-0606061 


‚0271118 


0 


17 


4999924 
.3328243 
.9474986 
.1939216 
.1562302 
.1276595 
.1049062 
.0861370 
.0702435 
.0565107 
.0444536 
.0337299 
.0240896 
.0153453 


.0073529 


TABLE3. VALUES ОР 2m(@—l) 
n— 
1 ~ — 1.000000 for all n í 
2 0 .9333333  .4285714 .4666667  .4838710 
3 0 .1666607 .2400000 .2777778, 
4 4 0 .1000000 .1538462 
5 0 .0666667 
6 0 
7 
9 
п ә 
т 10 11 12 13 14 
1 Я 1.000000 for all n 
2 04990215  .4995112  .4997557  ..4998779 4999390 
3  .9242229 .3273569  .3293923  .3307253 .3316032 
4 02978258  .2339066  .2383479  .9414585 — .2437059 
5 (1634568  .1723544 .1788676  .1837118 .1873611 
6 1159154  .1271791  .1355998  .1420028 .1469395 
т  .0785714  .0918937  .1019882  .1097724 . .1158027 
8 0480000  .0031313  .0747043 .0837155 .0908370 
9 0222022  .0389010  .0518519  .0619607 -0700084 
10 0 .0181818  .0322581 .0433566 0522416 
n 0 .0151515 -0271493 .0367965 
12 ? 0 .0128205 .0231660 
13 0 .0109890 
14 : 0 
15 
16 


0 
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T 
TABLE 3. VALUES ОЕ Üm"—D (Contd.) 
. Cm(n) 

ту 

т 18 19 20 21 22 23 24 25 

1 1.000000 for all n ч 

2 .4999962  .4999981  .4999990 .4999995 .4999998  .4999999  .4999999 .5000000 
3 .3329943  .3331075  .3331828  .3332330 .3332665 .3332888 .3333036 .3333135 
4 .2480833  .2485093 .2489308 ° .2492003 .2494015  .2495518  .2496643 2497151 
5 .1952045  .1962073  .1969942  .1976139  .1981031  .1984904  .1987974  .1990412 
6 .1581551  .1597031  .1609542 .1619699 .1627974 .1634737  .1040281 

7 .1301924  .1322655  .1339720  .1353836 .1365562  .1375340  .1383520 

8 .1080005  .1105635  .1126991  .1144881  .1159936  .1172659  .1183450 

9 .0897460 .0927608  .0952950  .0974369  .0992561  .1008081  .1021373 

10  .0743243  .0777549 .0806574  .0831272  .0852394 .0870540 .0886193 

ll  .0610250  .0648389  .0680820  .0708560  .0732409  .0753009 .0770878 

12 .0493679 .0535361  .0570949 .0601513  .0627902 .0650794 .0670738 

13 .0390147  .0435116  .0473637  .0506832 .0535590 .0560694  .0582511 

14 .0297189 .0345220 .0386476  .0422127  .0453100  .0480141  .0503852 

15  .0212903 .0263854  .0307669  .0345620 .0378671  .0407595 .0433020 | 
16 .0136054  .0189618  .0235845 .0275957  .0310960  .0341657 .0368697 i 
17 0065359  .0121457  .0169935  .0212082 .0948996  .0281295 .0309861 

18 0 .0058480 .0109091 .0153161  .0191746 .0225696 .0955705 

19 0 .0052632 .0098522 .0138756 .0174206 .0205584 
20 0 -0047619 .0089419 .0126294 .0158973 
21 0 0043290 · .0081522 .0115440 
22 0 .0039526 .0074627 
23 0 .0036232 
24 0 
25 

n> 

m 26 27 28 29 30 31 32 33 


=] 


1.0000000 for all n 


2 -5000000 forn 24 
3 .9333201 .3333245  .3333275  .3333294  .3333307 .3333316 — .3333322  .3333326 
4 -2498115  .2498587 .2498940  .2499206  .92499404 ` 2499553 .2499665 .2499749 
5 «1992352 .1993895  .1995125  .1996106  .1996889 ` 1997514 .1998012  .1998411 E 
6 -1648585 .1651676 .1654230  .1650342 .1658090 .1659539 .1660741 .1661738 | 
a .1396107 — .1401025 .1405137  .1408616 1411565 -1414068 .1416195  .1418004 
8 -1200468  .1207173  .1212923 .1917864 .1222119 . .1225788  .1228958 .1231699 | 
9 -1042646 .1051161 -1058542  .1064955  .1070539  .1075409  .1079665 0 
10 0911518 .0921776 -0930737 .0938586 .0945476  .0951537 .0956879 )4 
11 0800030  .0811943  .0822415  .0831643  .0839795  .0847012  .0853413 .0859103 
12 0703490 — .0716970 .0728875 .0739417 .0748775 .0757109 .0764525 ‚0771157 
18 ‚0618648 .0633607 .0646867 .0658656 ` 0669162 .0678548 .0686950 .0694488 
14 -0543172 — .0559525 .0574068 .0587037 0598633 — .0609026  .0618362 .0626766 
15 -0475339  .0493009 .0508763 .0599851 .0535481 .0546833 .0557058 .0566289 
16 .0413846 .0432760 .0449669 .0464810 -0478422 .0490684 .0501756 0511775 
17 -0357685 .0377778 .0395768 .0411929 -0426467 .0439595 .0451473 .0462243 
18 .0306065 — .0327276 — .0346298 .0363408 .0378840 — .0392793 .0405439 .0416926 
19 0258351 — .0280625  .0300629 .0318649 .0334925 .0349665 .0363044 .0375215 
20 -0214030 — .0237315 0258255  .0277142 .0294224 .0309714  .0323793 .0336619 
21 .0172679 .0196929 .0218762  .0238477 -0256329  .0272536 0287284 0300736 
1. CA -0159121 .0181807 — .0202314  .0220902 0237795 .0253185 .0267237 
ee үзе .0123601 — .0147103 .0168368 0187662 .0205213 .0221217 .0235844 
©з foc 00901283 .0114409 .0136400 0156371 .0174553 — .0191148 .0206327 
-0030769 .0058480 .0083516 0106206 .0126897 .0145616 .0162777 .0178488 
26 0 :0028490 .0054250 .0077611 :0098857 .0118229  .0135936 .0152159 
28 0 0026455 — .0050463 .0072310 .0099245 .0110478 .0127193 
29 0 0024631 .0047059 .0067535  .0086276 .0103467 
30 0 .0022989 .0043988 .0063218 .0080868 
5 0 .0021505 .0041209 .0059303 
32 0 ‚0020161 .0038685 
33 ) 0 .0018939 
0 
SS ыш сыту Lr qoM DMNMNNEN 
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TABLE3. VALUES OF С (оона) 

Ст(т) 
А 
п ә 
т 34 35 36 37 38 39 40 41 42 

1 1.0000000 for all n 
2 .5000000 for n>24 

.3333333 for n>38 
3  .3333328 .3333330 .3333331 .3333332 .3333332 .3333333 .3333333 for n>38 
4  .2499819 .2499859 .2499894 .2499991 .2499940 .2499955 .2499966 „2499975 .2499981 
5  .1998730 .1998984 .1999188 .1999350 .1999480 .1999584 .1999667 .1999734 .1999787 
6 1662566 .1663255 .1663827 .1664302 .1664698 .1665027 .1665301 .16615530 .1665720 
7 .1410544 .1420856 .1491976 .1422931 .1423746 .1424442 .1425037 .1495546 .1425981 
8 1234073 .1236131 .1237916 .1239407 .1240815 .1241988 .1243008 .1243897 .1244671 
9 (1036654 .1089519 .1092036 .1094250 .1096199 .1097916 .1099431 .1100707 .1101948 
10  .0905764 .0909451 .0972731 .0975637 .0978221 .0980519 .0982566 .0984390 .0986017 
11  .0864167 .0868683 .0872715 .0876321 .0879548 .0882441 .0885036 .0887367 .0889462 
12 (0777093 .0782413 .0787190 .0791485 .0795352 .0798837 .0801982 .0804824 .0807398 
13  .0701262 .0707361 .0719801 .0717828 .0722320 .0726388 .0730075 .0733423 .0736464 
14  .0034345 .0641193 .0047390 .0653007 .0658105 .0662739 .0060957 .0070799 .0674305 
15  .0574037 .0582202 .0589008 .0595311 .0600994 .0600176 .0610907 .0615232 .0619189 
16  .0520858 .0529109 .0530617 .0543460 .0549707 .0555417 .0560643 .0565434 .0569830 
17  .0472028 .0480935 .0489058 .0496476 .0503263 .0509481 .0515186 .0520426 .0525245 
18  .0427382 .0436916 .0445626 .0453597 .0460903 .0407008 .0473773 .0479446 .0484674 
19 0386911 0396445 .0405718 .0414218 .0422021 .0429196 .0435802 .0441893 .0447515 
20 0348327 0350036 .0368848 .0377855 .0386136 .0393761 .0400792 .0407284 .0413286 
21 0313031 .0324290 .0334620 .0344113 .0352853 .0360910 .0368350 .0375228 .0381596 
22 0230003 0201880 .0302707 .0312668 .0321848 .0330321 .0338154 .0345404 .0352124 
23 00249941 .0261535 .0272838 .0283248 0202852 .0801725 .0309937 .0317545 . 0324605 
24 (0220242 0233023 .0244785 .0255627 .0265639 .0274898 .0283474 .0291428 . 0298815 
25 0192902 0206152 .0218354 .0229613 .0240017 .0249648 .0258575 .0266863 . 0274566 
26 0167052 .0180754 .0193382 .0205041 .0215825 .0225813 .0235080 .0243089 .0251097 
27 .014550 .0156687 .0169725 .0181771 .0192920 .0203255 .0212849 .0221769 .0230072 
28 0119270 .0133827 .0147201 .0159681 .0171183 .0181852 .0191163 .0200983 .0209572 
29  .0097103 0112066 .0125883 .0138664 .0150508 .0161500 .0171717 .0181228 .0190092 
30  .0075954 .0091310 .0105497 .0118628 .0130802 .0142107 .0152620 .0162412 .0171545 
31 00055740 .0071475 .0086021 .0099490 .011198& .0123592 .0134393 .0144458 .0153849 
32 00036386 0052480 .0067382 .0081179 .0093983 .0105885 .0116964 .0127294 .0136937 
33 00017825 0034986 0049515 .0063630 .0076735 .0088922 .0100272 .0110858 .0120745 
34 0 .0016807 .0032362 .0046786 .0060183 .0072647 .0084259 .0095094 .0105218 
> 0 “0015873 .0030597 .0044277 .0057010 .0068877 .0079955 .0090309 
36 0 .0015015 .0028972 .0041965 .0054081 .0065394 .0075972 
37 0 “0014225 .0027473 .0039829 0051372 ..0062168 
38 0 .0013495 .0026087 .0037853 .0048862 
39 0 .0012821 .0024804 .0036020 
40 0 .0012195 .0023613 
41 : 0 .0011614 
42 9 
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On(n—1) 


TABLE 3. VALUES OF —7-—-' (Conid.) 
От(пт) 
T ———————————— 
п ә 
т 43 44 45 46 47 48 49 50 
1 1.0000000 for all » 
2 .5000000 for n>24 
3 .3333333 for n>38 
4 2499986 .2499989  .2499992  .2499994 .2499996 .2499997 .2499997 =. 2499998 
5 “1999830 .1999864  .1999891  .1999913  .1999930  .1999944 .1999955 .1999904 
6 .1665878 .1666009  .1666119  .1660210 .1666287  .1666350  .1666403 .1666447 
si .1426354 .1426672  .1426945  .1427178  .1427378  .1427549 .1427695 .1427821 
8 .1945340 —.1245935 .1946897  .1246897  .1247288  .1247029  .1247928  .1248188 
9 .1102991  .1103913  .1104729  .1105451  .1106090  .1106656 .1107158 .1107602 
10 .0987470 — .0988707 .0989927 .0990965  .0991893 .0992724 .0993468 .0994135 


11 .0891340 .0893043  .0894572 .0895950  .0897194 .0898316 .0899330 .0900246 
12 .0809719 — .0811826  .0813736  .0815469  .0817042  .0818471  .0819771 .0820953 
13 .0739230 .0741749 .0744043  .0746136  .0748045 .0749789 .0751383 
14 .0677505 .0680431 .0683107  .0685558  .0687804 .0689864 .0691754 
15 .0622814 .0626139 .0629191  .0631995  .0634574  .0636947 .0639132  .0641147 


16 .0573868 .0577581 .0581000 .0584150  .0587054  .0589736 .0592212 
17 .0529683 .0533774  .0537549 .0541035 .0544259 .0547241 .0550003 
18 .0489498 .0493954 .0498074 .0501888  .0505420 .0508696 .0511736 . 
19 .0452712 .0457520 .0461974 .0466104  .0469937  .0473498 .0476809 .0479890 
20 .0418842 .0423991 .0428768 .0433204  .0437328 .0441165 .0444738 .0448069 


21 .0387498 — .0392976 .0398064 .0402796 .0407202  .0411307  .0415135 .0418708 
22 .0358361 .0364155 .0369545 .0374563  .0379240 .0383604 .0387679  .0391488 
23 .0331164 .0337264 .0342944  .0348239  .0353180  .0357794 .0362108 =. 0366145 
24 .0305684 .0312080 .0318041  .0323603  .0328798  .0333056 .0338201  .0342 
25 .0281735 .0288416 .0294649 .0300469  .0305911  .0311003  .0315773 .0320215 


26 .0259157 .0266113 .0272608  .0278679  .0284359 .0289679 .0294666  .0299345 
27 .0237812 — .0245035 .0251784  .0258097  .0264008  .0269548 .0274745 .0279626 
28 .0217583 .0225064 .0232059 .0238606  .0244740  .0250494 .0255895 .0260971 
29 .0198366 .0206097 .0213330  .0220104 .0226455  .0232416  .0238016 .0243281 
30 .0180073 .0188046 .0195510 .0202504 .0209065  .0215227  .0221019 .0226468 


31 .0162624 — .0170832 .0178520  .0185728  .0192493  .0198850  .0204828 .0210456 
32 .0145950 — .0154386 .0162291  .0169707 .0176670  .0183216  .0189376 .0195177 
33 .0129991 .0138648 .0146764 .0154381  .0161537  .0168267 .0174603 .0180573 
34 .0114690 — .0123563 .0131884  .0139696 .0147040  .0153949 .0160456 
35 .0100000 .0109082 .0117602  .0125605  .0133131  .0140214 .0146887 


36 .0085877 — .0095162 „0103876 .0112065  .0119767  .0127020 .0133856 .0140305 
37 .0072281 — .0081762 .0090667  .0099036. .0106911  .0114329  .0121323 .0127924 
38 .0059176 .0068852  .0077939 .0086484 „0094527  .0102106 .0109254 .0116003 
39 .0046531 .0056395 .0065662 .0074378  .0082585  .0090321  .0097619 .0104512 
40 .0034317 .0044364 .0053806 .0062689  .0071056 .0078944  .0086390 .0093423 


41 .0022506 — .0032731 — .0042344 .0051391  .0059913 .0067951: .0075540 .0082710 | 
42 .0011074 .0021475 .0031254 .0040460  .0049135  .0057319 .0065047 .0072351 


43 0 .0010571 .0020513 .0029874  .0038698  .0047024  .0054889 .0062324 
44 0 .0010101 .0019614  .0028584  .0037049 .0045047 .0052611 
45 0 -0009662 .0018773  .0027375  .0035504 .0043192 
и 0 .0009251 .0017986 .0026242 .0034053 
48 0 .0008865 .0017246 .0025177 
49 0 .0008503 .0016552 
50 5 0 .0008163 
0 
BM ЕКИ E E netos o 9 


От(т—1) 


* Note: Values of ("in—1) ean also be obtained from ОН 
qn 


От(п) 
l1. Cm(n—l) , Oma(n—1) 
т От) Опт) ` 
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ON SIMPLE RANDOM SAMPLING WITH REPLACEMENT 


7. COMPARISON BETWEEN WITH AND WITHOUT REPLACEMENT SIMPLE 
RANDOM SAMPLING SCHEMES 


In conclusion let us compare the two simple random sampling schemes for the 
purpose of estimation of Y. If we draw a simple random sample with replacement of 
size n, then the variance of the sample mean is с/т. Further, in a simple random 

2 c 
sample without replacement of size п, the variance of the sample mean is Zh). 


Since 

gt (N—n) <= 98 z ч 

в (N—1) т 
it is usually claimed that sampling without replacement is better than sampling with 
replacement. Basu (1958) has pointed out that this comparison is not fair because the 
cost of selecting a sample of size n in sampling without replacement is greater than 
the cost of selecting a sample in sampling with replacement. For comparing the two 
sampling schemes, it would be appropriate to take into account the cost involved in 
the selection of two different samples. The comparison, thus, mainly depends on the 
choice of the cost function, and no sampling scheme can be said to be superior to the 
other unless the cost function is known in advance. Let us, for illustration, consider 
the case where the cost of sampling is proportional to the number of distinct units 
drawn. Thus the expected cost of selecting a sample with replacement of size n 
is equivalent to the cost of selecting a sample without replacement of size 


E(v) = x[i- eS it Basu has shown that in this situation the sample mean 


of the sample with replacement is worse than the sample mean of the equivalent ` 
sample without replacement. We now compare the sample mean g of the equivalent 
sample without replacement with the following estimator of with replacement 


sample : 


. _ МУ i. 
Уна) = ENIN —)] 


Tt has been shown that 


Jj Ee y NN»), 7:1; 
Vow) = уа [ ХУУ } GA) 
ee 7.2 
and Y = [ку x]? (7.2) 
Since М№/(М—у) is a convex function of (1 < v € n < N), 
133 
E [NIN —v)] > NEIN ЕУ] | * - x] .. (13) 
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From (7.3), it is evident that the first component of V(,) is smaller than 
V(g). Thus for a population whose coefficient of variation is sufficiently large V(j,(,,) 
would be smaller than V(g). This comparison shows that the sample mean of without 
replacement sample cannot be uniformly better than all estimators of with replace- 
ment sampling. 


However, the comparison made above is not very satisfactory. First, because 
of the linearity of the cost function and secondly, because E(v) is not necessarily an 
integer. We hope that for some other cost functions also, similar situations may be 
found out where with replacement sampling would fare better than without replace- 
ment sampling. 
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ALMOST UNBIASED ESTIMATORS BASED ON 
INTERPENETRATING SUB-SAMPLES 


By M. N. MURTHY 
Indian Statistical Institute 
SUMMARY. In this paper a technique is given for estimating unbiasedly any non-linear function 


of estimable parameters., The technique consists in estimating the bias of the usual estimator using esti- 
mates based on interpenetrating sub-samples and then correcting the estimator for its bias. 


1. INTRODUCTION 


The question of evolving & generalized unbiased estimator for any sample 
design has been considered by Midzuno (1950), Godambe (1955) and Nanjamma, 
Murthy and Sethi (1959) for certain classes of parameters. Murthy (1962) has sug- 
gested a technique of generating unbiased estimators for any sample design for the 
class of parameters which can be expressed as a sum of single-valued set functions 
defined over a class of sets of units belonging to the finite population under considera- 
tion. Examples of such parameters are the population total Y and the population 
variance which can be expressed respectively as 


and 


In this paper we shall supplement the generalized theory of unbiased estima- 


tion (Murthy, 1962) by giving а procedure of obtaining unbiased (or almost unbiased) 
estimators for non-linear functions of parameters each of which can be expressed as 
single-valued set function defined over a class of sets of units belonging to а finite 


population. Examples of such parameters are given by ratio of population totals 
correlation coefficient, ete. 


of two characteristics, population standard deviation, 


The procedure of obtaining this unbiased estimator consists in estimating the 
the same non-linear function of 


bias of the usual estimator which is taken as 
unbiased estimators of the parameters as the parametric function under consideration, 
on the basis of interpenetrating sub-sample estimates. This procedure is based on 
the technique used by Murthy and Nanjamma (1959) in estimating the bias of а 


ratio estimator. 

paper is likely to be of much help in survey practice, 
between characteristics and between parameters, 
s or the population coefficient 


The procedure given in this 
since the estimation of relationships 
such as a ratio of population totals of two characteristic 
of variation are usually of much interest in sample surveys. 
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2. PARAMETRIC FUNCTION 


Let the parametric function /(@) be a single valued non-linear function of the 
parameters (0, 0,,...,0,), where 0; (i = 1,2,...,k) can be expressed as 


б= X fla) cc E 3.1) 
aieA; 


where f(a;) is a single-valued set function defined over the class ‘А, of sets ‘a,’ 
consisting of units belonging to the population X. 


Suppose we have defined the sample space ‘S’ of samples ‘3’ with a suitable 
probability measure such that itis possible to estimate the parameters (0,, 0., ..., 0) 
unbiasedly using the procedure given by Murthy (1962). That is, it is assumed that 
the sample space is so specified that each аА; (i = 1, 2, ..., k) occurs in at least 
one ‘5’ and that each.‘s’ contains at least one set ‘аг in ‘A, (i = 1, 2,...,k). Then 
a generalized unbiased estimator of 0; (i = 1, 2, ..., k) is given by 


== X Ла), a)/P(s) e (2.2) 
i а сз > 
where ` | У фа) = 1. 
: PSEUD : 


In fact, we can make the above formulation more general by relaxing the assump- 
tion that буз (i = 1, 2, ..., k) are estimated from the same samples. In other words, 
0; (i = 1, 2, ..., k) may be estimated on the basis of the same, overlapping or non- 
overlapping samples drawn with the same or different sample designs. 


Let (t 1, ...,%) be unbiased estimators of the parameters (6,, 0, ..., Or). 
Then an estimator of /(0) can be taken as f(t). 1ff(0)is a linear function, obviously 
Л) will be unbiased for /(0). But here we are taking /(0) as а non-linear function of 
(б, Ons ..., 9) and hence f(t) will, in general, be biased for 70). | 


3. BIAS AND MEAN SQUARE ERROR 


Tn this section approximate expressions for the bias and the mean Square error 
of the estimator of f(t) are obtained by using Taylor series symbolically. It may be 
noted that in statistical practice one is interested not so much in the convergence 
properties of the infinite series representing a function, but in finding out whether 
the first few terms of that series will give a good approximation to the function. 
Because of this, the question of the validity of the application of Taylor series expansion 
to the case of a finite population estimator will not be considered here. However, 


it will be assumed that the estimator t; is such that | 2 X 1 especially for 
i 


estimators occurring in the denominator of the function f(t) so that the first few terms 
of the expansion can be expected to give a good approximation to the function. This 


latter statement has been empirically verified in the case of applying this expansion 
to a ratio estimator, 
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1f the sample size is fairly large, the assumption m <1 will be valid. 
Dum : 
Let /,—6; (12-6), (i—1, 2; ..., k) and #06, tp 1t), (05, Oa +- бу), ее, еь 6). 
Expanding f(t) in a Taylor series about t = 9 and neglecting terms of degree greater 
than 2 in ез, we get 


Nt) =/®+ ва (2) | 
t-0 — — 


ШЕК OH Еа ду 
=) У 02 22 > 00, I Tua S gl 
+512 (а) „+ il > es as). | > 
It may be observed that for certairi parameters there will be no terms of degree 
greater than 2 to neglect. An example of such a parameter is the product 0,0, with 
the estimator t,t), Taking expected value of f(t) in (3.1), we find that the bias of 
/@) correct to the second degree of approximation is given by 


argo] = | AC), nee zu б, ый] с. @® 


where кй) = Вид), = 1, 2 s. 
The mean square error of f(t) to the second degree of approximation is given by 


2 
моо) = ї0—/®р=® [s (2) |] 


teo 


( ш 0» 


m TUAM ж de (af 
-30 мон Oa 


i=l j>i t=0 


4, -BIASES OF TWO ESTIMATORS ы. 
Suppose the sample on which the estimate t; of 6; (i = 1,2,..., k) is a 
is selected in the form of n independent interpenetrating sub-samples. Let ti, be the 


unbiased estimate of 0; based on the s-th independent interpenetrating aub-sample 
(i= 1, 2, ..., k; в =1,2,..., т). In this case let us consider the following two esti- 


mators T, and T, of f(A). 


m= = 5 ft) = (41) 
whee t = (tp Ба PEST (8 = 1, 2; ssh) 
and ИЕ 2% (4.2) 
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Applying the result (3.2) to Т„ in (4.2) we get 


20) = р P ( ar) Же ыщй)+? b EE (аа) Ea ы) | | 


AS ео 
where Blij) = Е@ —0,)(%—0,) = E A (47), 


дыф) = Е,—0)(6,—6). 
That is : 


of 5 s of с] 
(9) 1-0 Iii) 2 A A (as) AA fuu) : 


12 ? | 
= 1 E Bj. оз E 
The bias of the estimator T, in (4.1) is given by 


в, = ВТ) = BL ss (4.4) 


Comparing (4.3) апа (4.4) we find that the bias of the estimator T, is n times that of 
the estimator T,. 


5. ESTIMATION OF BIAS 


As observed in Section 4, comparing the biases of the estimators Т, and Т,, 
we get 


В, =n B, (5.1) 
Using this result we can derive an unbiased estimator of the bias B. 
ЕТ.) = f(6)--B, 
Е(Т,) = f(0)4-B,. 
Hence E(T,—T,) = B,—B, = (n—1)B,. 


Thus an unbiased estimator of B, is given by 


п, ... (5.2) 


The variance of the estimator of B, is given by 


5 ИГ 
V(B,) = Gi (a?—2pa+-1) ... (5.3) с 
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where a? = V(T,)] (71), and р is the correlation coefficient between the estimators 
Т, and Tp. For most of the sample designs a? and p will tend to 1 as the sample size 
increases and hence the variance of the bias estimator will tend to 0 as sample size 
increases. 1% may be observed that an unbiased estimator of the bias of T, is given 
by 
B, =a (T,—7,). ... (5.4) 


6. (ALMOST) UNBIASED ESTIMATOR 


Since an unbiased estimator of the bias of the estimator Т„ has been ob- 
tained in Section 5, the estimator 7, can be corrected for its bias, thereby obtaining 
an unbiased or almost unbiased estimator of f(0) according as the third and higher 
degree terms in ‘e’ become 0 or not. Inthe latter case, the estimator is said to be almost 
unbiased since it is unbiased only to the second degree of approximation. The esti- 
mator corrected for its bias is given by 
TiTa МТ. ME 


тет Чү (D 


Tt may be noted that this is the corrected estimator we get, even if we correct the 
estimator Т, for its bias. 
The variance of the corrected estimator is 


ит, = E (n%a*—2pa-+1). 2. (63) 


The gain in precision in using T, instead of T, is given by 
MART ar port в 
M, (n—1)y(0* 4-25) 


where 22 is the ratio of the square of the bias of Т, to the variance of T, Ifthe sub- 
igibly small. Neglecting 2? in the above expression, 


G(T.) == 


sample size is large 22 will be neg! 
we find that the gain in precision will be positive if 

(2n—1)a*—2npa+1 < 0 
which will be true if ‘a’ lies between the roots of the equation 
(2n—1)a?—2npa+1 = 0. (6.4) 
he minimum value of n which makes the corrected esti- 


For given values of a and p, t i йр А tivel: 
mator more efficient and the value of m which maximises the gain are respective y 


given by 
(179) 141 3 ss (6.5) 
ae 
- (1—ра) - ... (6.6) 
апа ра) 
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TABLE 1. SHOWING THE MINIMUM: AND MAXIMUM VALUES 
OF G(7.)) AND THE CORRESPONDING VALUES OF n 
FOR DIFFERENT VALUES OF р AND a (P > a) 


— 


вт. a cn p г minimum ' maximum 

ji nm — QD) n т) 
1 0.6 9.7 6 0.0089 10 0.0192 
2 0.8 3 0.0556 4 0.0988 
3 SE S0. 7 73. ^. 6.0886 3 0.3056 
А Шү 0.8 4 0:0113 7 0.0266 
5 0.9 2 0.1020 3 0.1684 
6 0.8 0.9 з 0.0469 4 0.0486 


` Source ғ Murthy, M. N. and Nanjamma, N. S. (1959): Almost 
unbiased ratio estimates based on interpenetrating sub-sample esti- 
mates, Sankhya, 21, 381-392. 


7. ILLUSTRATIONS | 


In this section, the results derived in the previous sections are applied to 
some particular cases, 


Case (i): f(6) = 6. Let t be an unbiased estimator"of 0 based on any 


sample design. Then an estimator of 700) is given by 


ЛХ) =. | КЕ (7.1) 


The bias and mean Square error of f(t) correct to the second degree of approximation 


are given by - 
BIJO] = Kk—1) 36) e. (1.9) 
ЭТ] = PEOP MEUS 
where C? is the relative variance of { = V(t)/02], since 


df ua dap 20 ae 
й — № and 7 Щр туз, 


The bias relative to the mean square error is - 


Bel 1 gy 
Mj) а (k—1)202, e. (14) 


2 


From (7.2) апа (7.4) we see that the bias of f(t) and its contribution to the mean Square 
error both decrease as the sample siz 


е increases, since for most sample designs С? 
decreases with increase in sample size. 
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ALMOST UNBIASED ESTIMATORS. 


., n) are unbiased estimates of Ø based on n independent 
mples, the following two estimators T, and 7, of f(0) can be 
dod 
= „. (8) 
р | lat of T, is n times that of the bias of T,. Hence an unbiased 
Es ^, is given by 
: Ea MN о i-o. = 
B E RRN ÊT) = aay а. (тл) 
4 ес imator is given by 
Ў БЫ y n — Ea г : 
ET T; = a) ` HANS) 
i d that the expression for bias and the corrected estimator 
E vi ї if k in f(0) is 2. 
ion Жыры (ө). The correlation coefficient a sd 
соу (т, y). ; RO 
=, ЕЙ 
^7 Ута Vo) ү 
ic function is of the form 
| ө, Й 
= -—==- aa LO. 
fO= 75 8; (7.10) 
SANU 
0 = VD (7.11) 
ч sed estimators of б, 0, and 6, respectively. The bias 
“error of f(t) correct to the second ct of лыш ате 
EU FO tt, mg) ао) +203) we (11%) 
чл = UO ts ныны) nein COLLE 
ъ= "NCC. 
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Let t; (i = 1, 2, 3) be unbiased estimates based on the s-th independent inter- 
penetrating sub-sample (s = 1, 2, ..., т). Then using the two estimators 


ДЕР л сй» 
P == ... (7.14 
s cs V tostas f ) 
В 

T, = —== (7.15) 

= Vi t 

we get the following corrected estimator of p 
qo tfe. ws (7.16) 


(n—1) 


Case (iii): Regression Estimator. Let у and x be unbiased estimators of the 
population totals Y and X respectively and let b be a consistent estimator of the 
regression coefficient obtained by taking the ratio of unbiased estimators of the 
covariance between x and y and the variance of =. 


The regression estimator is E y=y-+0(X—z). 205 (7.17) 


The estimator in this case is of the form 
/@ = ых. ... (1.18) 
УС 


The bias and the mean square error of this estimator, correct to the second degree of 
approximation, are given b; 
B[f(t)] = PX (Vz — va) 2 (7.19). 


and М] = Viy)—2B cov (а, у)-Е/#У(@@). 2. (1.20) 
_ By defining the two estimators Т, and 7, on the basis of n interpenetrating sub-sample 
estimates we get the corrected estimator as 
T,—T. 
` Д ата. 
ae) 
Case (iv) : Skewness (В, = p,[n$). The parametric function is of the form 
ХӨ) = 6/08 
and an estimator of /(0) is given by 
fO = 4/8 


where /, and #, are unbiased estimators of 0, and 6, respectively. The bias and the 
mean square error of f(t), correct to the second degree of approximation, are given by 


В[](@] = (380—203) ... (7.21) 
and MISO] = Bion-- 435—401) eso (1.22) 
where vj = E(t—0;)(6—0,)/0,0;. 
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Defining suitably the two estimators T, and Т, based on n interpenetrating sub- 
samples we get the corrected estimator, as before, as 


A nTa T, 
пер“ 


8. ESTIMATION OF BIAS 


General Case. Suppose (0) is the parametric function of the parameters 
(0,, 0, ..., 0;) and f(t) is an estimator of f(0) based on the estimators (f, ta, ..., tp) 
which are unbiased for the parameters (04, Oz, ..., 0,). Let t; = 0;+h, i = 1, 2,..., К. 
Applying Taylor series expansion to f(t) about t= symbolically and neglecting 
terms of degrees greater than p in h/s, we get 


oe 
FO = Өф, р. PEN Pto MS „в 
Taking the expected value of (8.1), we get the bias of f(t) as Күр 
1 7 
ву] = ij Йй, 5 Blinis i) Баа т ... (8.2) 


das is, es 


Suppose t; is an unbiased estimate of 0; BEA on the s-th independent inter- 
penetrating sub-sample (i= 1, 2, ..., k; в = 1, 2, ... т). Let us consider the follow- 


ing p estimators of f(0) 


m = 1,2, ... p—l, n .. (8.3) 


=i 
(50 
where (d (m) = (т), ln), әт), 


i(m) being the mean of the estimate t; based on a combination of m sub- -samples . 
taken from the n independent interpenetrating sub-samples and X denotes summation . 
over all combinations of т sub-samples. formed out of n sub-samples. 


The bias of Т, to the p-th degree of approximation is given by 


В» — B(T m) = 


= ы УЕ [ 2 б à 4 (hs ia М) oi, a; ) | 
... (8.4) 


(su 
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where A; = E t his. After simplification the bias of T,, may be expressed in the 
M s=1 
form 


a (m = 1, 2, ..., p—1, n) ... (8.5) 


Ras 
j=2 


where A, is a function of the j-th order moments and product moments of the estimators 
(tis ta) ..., £j) and of terms of the form 


И GE ET rj) 
е dii, е. 2 
From (8.5) we see that in the series of estimation {Tm}, B(T,,,,) < B(T,,). 
Since ЕТ») —f(0)3- By, 
we get E(T,—T,) = B,—B, = $ (1) Aj. ... (8.6) 
1— “т 1— Dm £m m^ 
Let D, = (T,—T,,). The equation (8.6) can be written as 
ЕР) = АЛ, E (8.7) : 
where | D = (Dy Ds, ..., Dy 4, Dy) | 
А = (As As, ..., A5. Ay) 
1 1 1 1 
IAEA C cr I- 
1 1 1 1 
-|l——- — БАЖЫ] ерд ——— 
рашта GF ‘бе 
1 1 1 1 
ELT DEM es eras EGTA, 


It may be noted that in (8.7) we have (p— 1) equations in (р—1) unknowns. It may 
be observed that we are considering p estimators since there are (y—1)A’s and /(0) 
to be estimated. Solving (8.7) for 4 we get 


A= E(D) ^3. ... (8.8) 
Taking В = (В,, By ..., B, 4 B,), we get 
B = A(e--A) ... (8.9) 


where ‘e’ is a (p—1, p—1) matrix whose elements are all equal to 1. Substituting 


in (8.9) the solution for 4 obtained in (8.8), we get unbiased estimators of the biases 
of the estimators, namely, à 


B = рА-%—р. | 
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Case (ii): р=3. 


Let us consider the 


(8.1 1) ) 
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we get after simplification 


4 nè 


n—2 
B= LN. EU eu (8.15) 
E 12m n? 
B=- nE 51-08)", 8-10] 
Р apod (3n—2) (8.17) 


о 103)" 
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ON SAMPLING WITH UNEQUAL PROBABILITIES 


By P. К. РАТНАК 
Indian Statistical Institute 


SUMMARY. This paper deals with the problem of deriving improved estimatore in sampling 
schemes with unequal probabilities of selection. Тре improved estimator of the population total, Y, (Basu, 
1958), is derived. In addition, two sets of estimators of Y and Y? are given. Тһе first set of estimators 
is unwieldy to compute, while the second set is simple. "Тһе second set of estimators, though less efficient 
than the first, is more efficient than the usually employed estimators. 

It is proved in subfield terminology that if , and G, are two sufficient subfields and K is a set 
common to ,S, and Ga, then §,K+.$,K’ is also а sufficient subfield. Hence the subfield SK + SK’ 
can be used to derive improved estimators by Rao-Blaekwell theorem. Generalisation of this is also given 
in ease of countable number of subfields, Application of this result to sampling with unequal probabilities 
is given. 

1. INTRODUCTION = 
Consider a population containing № units. Let y; be some real-valued charac- 
teristic of the j-th population unit in which we are interested.1 Suppose that a sample 
of size n is drawn from the above population with unequal probabilities of selection 
(with replacement), P; being the probability of selection associated with the j-th 
population unit (ХР; = 1). Tf for the i-th sample unit, we record its Y-charac- 
teristic у, probability of selection фр; and unit-index w; then the sample of 


observation is 
8 = (21,1, ... LAN 
where ж; = (Yp Pi щ):2 


Tt has been shown by Basu (1958) that the *order-statistic" 
T = (tay £g» eo Xo) 
(where aq), ča: Top ате the distinct units in the sample arranged in ascending 
order of their unit-indices) is sufficient. 

Therefore, if g(S) is some estimator depending on the sample S, for any bone 
vex (downwards) loss function, an estimator uniformly better than 5) is given 
by E[g(S)|T]. In the subsequent sections this result is used to derive improved 
estimators of the population total and its square. 


2, ESTIMATION OF THE POPULATION TOTAL 
The usual estimator of the population total 


УИ 
E 21 
is, given by { p ( ) 
where = e 


(1) to (7) unless otherwise stated. 


1 j varies from 1 to N,i from 1 to n and ($) from 
to the sample. 


2 Capital letters refer to the population and small letters 
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Theorem 1: For any conver (downwards) loss function, an estimator uni- 
formly better than 2 is given by — 


= FT = Oa re о (2.2) 
Puy 
where Qu, = Реа...) рау...) 4 :..(— "057. (2.3) 


[pay 9o)" —X(pqy4-.. oc) --- (7) 7X0] 


the summations X, and Xi stand for all combinations of p's and айй combinations of 
B'S containing руу (chosen out Of Pay Dia, +++) Ру) respectively. 


Proof : Obviously by Rao-Blackwell theorem, an estimator uniformly 
better than z is given by 


EEIT) = z(9 (T) = x YO pr. = s pm. B (24) 
eim = (2) = z Jo. Pte, = a] 
yr 0—1! ар ayy 
ао ffo. 1 
But ТУР Pla, = «|T] = nS e cm Bee (2.6) 


ao T... | Pay Ufo) 
- where 2/ means summation over all integral «’s such that 
aa > 0 fori —1,..., v and 9) - 0) -..., Но») = n, 
and &” means summation over all integral о?в such that 
ар 20,0452 0 for í4i— 1,2; 55 v 
and ааа... Бау = (n— 1). 


It can be seen by induction over y that 


Й n! «у 
а uei ets 
ау 1... оу! 


р” = [(рау+...-Еш))”— X, (Pay. -2o-1)" 


ceo) E, рау; 
and 


у” (n—1)! ay p” 


Re 12 .. = Фо. FPT Ура)... 99-4) 
(1) 5e Hy | a 


Tes. n. (26) 
Using (2.4), (2.5) and (2.6), we get 


2, = E(2| T) = у, 0, ¥o 
Pw 


Hence the theorem ig proved. 
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The above estimator, though better, is not very useful in large samples because 
of cumbersome computation of C's. In Section 4, we shall derive a simpler estimator 
of Y better than z. Table 1 gives exact expressions of 2, for п = 3, 4 and 5. For 
n= l and 2,2, and z are identical. т 


TABLE 1. z, FOR n=3, 4 AND 5 


n- 3 4 5 
Р 
1 ya» Ya Yay 
Pay Pa 4 Pay 
2 У (2paj po) t x [eom -Pip lo = | fone! ya 
3(pay +p) 
POE [ioo et, ot, | | (potpao) rf ЕР, ] 

3 b x [орото iss = [retra tp во) 

Pa) $ 


рау + Pay Pol 


2 2 yay 
mee Pes rtp] n 


2 2 
p ГА TP, UE) 


- 8 (Poo Pay Peso» 3-PaPo») ] 


ya 
4 — oun = x Е teat | 


5 [ purtrm teat Po | 


zn 
Pay 


а ДЕЗ АВСНАА шшк = 


3. Estimation or Y? 


The problem of finding an unbiased estimator of Y? arises in most problems 
at 
of variance estimation of estimators of Y. The usual estimator of У? is 
Tee oS 81 
= E ща. 
И пата 


Theorem 2: For any convex (downwards) loss "function, an estimator uni- 


formly better than z, is given by 


5 OIA Aou s (8.2) 
Е(,|Т) = 2 Се ot py Cin P fun 


А d 21(5 тте) 
Remark: zp when у=(®—1) may be expressed іп a simple form i Z(n-1) = ^ Po = Pd 
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where gay = 20а... o) 3— 2i (Pay +--- Бро)" 2+. --(—) 905? 
ү; [(Фа)+...-Е2))"— УА(р()у-Е...-ЕРо-›)"-+Е... (7) 24 ру] 
and бшу= 


Фере а... Бро) EF (Dat. 3-26)" +... (=) "(poten)" I] 
(ри... Бо)" Zi (Bayt-----2o-)) +- (—)7 2, pay") 


(3.3) 
the summations X, and Zi have been defined in (2.3) and the summation У" stands for all 
combinations of p's containing py, and pq. 


Proof: Obviously 


n 
B(s,|T) = E [ tw eR али | = Иа z|T) 
= X Zaya Ра = Uy, ta = tan |T]. ... (3.4) 
It is easy to see that 
—2)1! 
po —(0—2)! + о... tan 
Play = яв» а = ta |T] = А = оо» ... (3.5) 
um nr ray yan 
z 1) Yrs Qv) [^u Py, 
(n—2)! С « 
р... р 
апа Ре, = ta» 2, = tan |T] = “ш ае! m =, ... (3.5.1) 


а р) 
Ф, Ф 


LT Hy! “oD m 


where 2, Х” have been defined in (2.5) and E" means summation over all integral 
о?в such that 


а... Бо) = (%—2), б) > 0, dun > 0 and ay > 0 for k it = 1,..., у. 
Tt can be proved on lines similar to (2.6) (by induction over v) that 


Plea) = to ta = to | T] = бы) 
(3.6) 
and Ри) = у, % = tan | T] = C an) 
Using (3.4) and (3.6), we get 
» » 
Ez, | T] = Z Cs HE LZ В Саз) Za Zan ч (3.7) 


Which proves the theorem. 


Improved estimator of о? : The usual estimator of o= MP (5-r) is 
j 


given by ЕЙ т ;‚—2)® 
По ma e, * 


ON SAMPLING WITH UNEQUAL PROBABILITIES 
Corollary 1: Thus am estimator uniformly better than si is given by 


a 6-9 : 
me? n= | SAE рт] Caeo 89 
Corollary 2: Ат unbiased estimator of V(z,) is given by 
v(5)-— B— ХО у X Оцу): ... (3.9) 
4-1 Чт 


Since this estimator is quite complicated for use in large samples, Basu (1958) has 
suggested the use of 


1 
$ (а). ... (3.10 
| па) 679 К 
as an estimator of V(%). As it over-estimates У(2,), we are always оп the safe side 
to use (3.10) as our estimate. 


The estimators derived in this and preceding sections, though superior to the 
usually employed estimators, are not of much use for large scale sample surveys 
owing to their cumbersome nature. In the next section, we give simpler estimators 
of Y and Y?. These estimators, though less efficient than the above derived esti- 
mators, are superior to the usually employed estimators. 


4, SIMPLE IMPROVED ESTIMATORS OF Y AND Y? 


Let us suppose that the observed samples are segregated into groups of equal 
p;s. For instance, consider the problem of estimating the total yield of a crop from 
a sample of farms. Every sample-farm is to be selected with probability proportional 
to its area. Here, if some crude approximation (say correct to an acre) is used to 
measure their areas, we expect to get number of farms with same p; in the sample. 
In the sequel, by the p-value of a unit, we mean the probability of selection associated 
. With that unit. Let ра» pu» Pay be the distinct p-values of the sample units 
arranged in an increasing order of their magnitude. Let n, be the number of sample 
units having pe as their p-value. However, not all these s; units will be distinct, 
let уң be the number of distinct units among them. Now, if we arrange these у 
distinct units in an increasing order of their unit-indices and call them Zun, €) +++ 
Ctv p» then it is not difficult to see that the statistic 
(4.1) 


T* = [fran «+s Фар qp тафо Wam о Peer Gy} п} 
is sufficient. 


Tt should be noted that if we take away the ancillary statistics nu), ..., Ma 
from the sufficient statistic 7*, then it reduces to the ‘order-statistic’ T defined 
in the earlier section. The ‘unnecessarily wide’ sufficient statistic T'* is used here for 
the purpose of deriving estimators of Y and Y? that are much simpler (though some- 
what less efficient) than those considered in the previous section. Theorems 3 and. 
4 below give simple improved estimators of Y and Y? respectively. 
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Theorem. 3: For any convex (downwards) loss function, an estimator uniformly 
better than = is given by 


ке 
pc ра ... (43) 
ті фу ` © 
{ 1 ® 
where D = — У yan. 


(i) Уң) r=1 


Proof: Evidently an estimator uniformly better than 2 is given by 


щит“) = в (^ 


т") vee (4.3) 


Further, the probability of getting a sample with a given T'* is 


Жу ааа ЧУ Фо жуу)... С, (mq), . (4.4 
P(T*) ЕЛ Ds Po vay’ w) "ar w) (4.4) 
n, D n, M) Е, у n, 
© O %) © ©, 
where Ота) = v -( 1) 0-0 ®+...(—) ( um 1 ; 
(6 = 1,...‚®) 
and 
(n—1)! ^q) 9-1 әр 
Phi Sis | Bay Ponal o moD ngl 20) + Pe) Pay 
mrt n! na) i) "hy 


May! va My! vee no PA) Py + Pry 


1 Cry (nay) - 
бшш). су-и Ue О.т) 
t 


Саа). Cray(nq)).. Orana) 


SO GE х 2. (4.5) 
п MI 


From (4.3) and (4.5), it follows that 
E US 
Ете) — 1X tw у, 


Which completes the proof of the theorem. 


А simple comparison of 2; and z will show that 
only if the sample size is greater than two and at leas 
the same p-value, otherwise 2, and 2 will be identical 
a direct proof of the fact that Vie) < Vie). 
when the above condition is satisfied. 


2, will be superior to z if and 
% three population units have 
- № is not difficult to give 
The strict sign of inequality holds only 
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ON SAMPLING WITH UNEQUAL PROBABILITIES 
Theorem 4: For any convex (downwards) loss function, an estimator uni- 
formly better than 


1 x yi У: 


[2 
x n(n—1) ыйа pi pi 


is given by 


s- aloo t 


n(n—1) Pa iei PO 


е 0, @ (Cd 1) 2 
Фф—1 Si 
=S nalna 1) 6, , (a) . m |; os (4.6) 


1 
where ЛУ, = DE i fu — 
vo 1060) —1) have meaning similar to those defined in (4.4). 


Proof: Obviously, an estimator uniformly better than z, is given by 


Ty) and C, mo) 


and C 


вы] = BB ыт]. e. (44) 


Further, № can be shown that 
nae | satio: 


n(n—1): P Rd" 


Ра; = tur % = ?un|7*] = 


Ah es Cus) в, щи) r 
Pla, = 2G, ta = ДЕЕ fana —1) х| йш) on an |, (у У) 


n(n—1) M (51) Cgo) 
тота) d l d e. (4.8 | 
and Pla, = у, % = up) 2] SD) ug (i 3 V). (4.8) 
Therefore, 


х fulta) X йө Ov) —1) 


MOS. 1 ^(n—l) ra Pa Ya Oo) 


FÈ тат у Y? стст) (0,00) Со 1] 
ia  n(n—l) TOR p Ун) MNO, no) 


x rotan P * Ju Jac 1 _1_ ... (4.9) 
ига n(n— 1) "= та Pay Pah Уш Ун? : 

Using the equality 
С (9—1) к бутө. © 


Wu o= i ar) 
Cog) wo MO уба) г 
— mqy—1)] ® 
_ (бави) бы E 2 pi Jan) Var» 
уо (HANEY uo) т 
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and simplifying (4.9) we get 


= P 
1 E Я 8 2 я 
is [SR d rcc m x d EST EO 
л He TS] n(n—1) [ (E 7 fo ) See Pe } 
Li Cono!) s. 
ne Nc 9 „т 
x «nay — 1) C, , (mw) 52] 
This completes the proof.* 
Corollary 3: ЈЕ is easy to see that 
2 
* = —— Xx(q—:f|T*]—z|9|m«|]—mg |. т» 
EISE ics =й ET | FI | E Pa | 
ЕК E т) у Kir) $ molna—l) © уф Боо) Sy, 
DS CUNT S WAS. Pu E n(n—1) [A mw) Р 
k 9, 2 b j? 
rape) л ES RI ALCUN » (4.10) 
n(n—1) [(® "ез Ре) ) dir. n Py ] 


is а simple improved estimator of s 
Corollary 4: An unbiased estimator of V(z) is given by 


k = (m)—1) È 
=) — 22 пату 1) C, ai о 
Д а тут бт ш) 25 


= amc [( х x KO) fu, y -X т Ро 1 |: 2. (411) 


il D) il Po 


However, in practice it seems reasonable to use 


- LoT . (412 
wp (x) pos 


First, because it is simple to compute, secondly, because 
mà this, we arè always on the safe side as it over- 


as an estimator of V(z* 
it is always non-negative. 
estimates the variance of ғ 


a eoo 


i = 
* The estimator Zp requires the computation of the ratio ео Values of 62-1" —1) 


»n "' Оп) 


сап 


be obtained from the7relation €-1(^—1) 1 _ €, (n—1) 
о ооо 
Val C, (n—1) s. В 
alues of. “Gy (n) dave been tabulated for all > and n—1 to 50 in a paper published elsewhore 
in the same issue (Pathak, 1962), pp. 287-302. 
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ON SAMPLING WITH UNEQUAL PROBABILITIES 
5. A REMARK ON Rao-BLACKWELL THEOREM 


Let X be the sample space of all possible outcomes 2. Let $ be the field 
of subsets of X on which a set P = {p} of probability measures is defined. Any sta- 
tistic T(x) [an G-measurable function defined from X onto another space У] generates 
a subfield SrO S. The statistic T or the subfield Sp is called sufficient for P. 
(Bahadur, 1954) if corresponding to each -measurable set A, there exists an &r-P- 
integrable function фа such that 


PlxeAg ( А] E la dp = 1 фл(х)ӣр for A, € Sr, рер, Б) 


It is known that any estimator g(x) based on the sample x can be uniformly 
improved by taking the conditional expectation of g(x) given a sufficient subfield. 
If several sufficient subfields are available, the minimum condensation of g(x) is mostly 
obtained by employing the minimal sufficient subfield ($1 say). There are situations, 
e.g., in sample surveys where this minimal condensation is unwieldy and it is not 
to use this condensation. But it is sometimes possible 


possible for practical reasons 
to divide X into subsets K(eS,) and Ке such that the condensation is simple on K and 
densation can be achieved with the help 


unwieldy on K'. In such cases simpler con 
of some other subfield, $», which contains the minimal one. It follows as a consequence 
of Theorem 5 (stated below) that condensation smaller than that of S, (but larger 
than that of S,) can be obtained by employing S,K+SK' as the sufficient subfield, 
and this condensation will still have the merit of simplicity. 


Theorem 5: Let S, and S, be two sufficient subfields of (X, S; P), and K a 
set common to S, and Sa, Then the subfield* 


S; = S,K- SE (5.2) 


is also sufficient. 
Proof: Since $: and S, are two sufficient subfields, there exist for each 
S-measurable set A, an S,-P-integrable function фул and an .S,-P-integrable func- 


tion фу such that 


| d= [фар for AyeS; (= 1,2, ), peP. (5.8) 
А; ПА Ai 
Now for any Аз = A,K+A,K'eSs; and for each AeS 
Г = Г Ф: (5.4) 
АзпА ^ ds 


where фал = Praxe PaaX s and y,—1—xK is the characteristic function of the set K. 


It is easily seen that ф:л(2)Хь and фал 
Фад is S,-P-integrable. This with (6.4) implies that Ss 


If $: С Sa then з,С $C Sx 


хь are both S,-P-integrable and thus 

is а sufficient subfield. 

Corollary 5: 

the form Ss = {A1K+A42K’}, Ai® S; @=1, 2). 
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Corollary 6: Let Sı, S». Sp be k sufficient subfields. PRENDS A; 
A, CU) A; =Z, 4: 4;=¢0 4j=1,...,h)] are k sets such that A; $; (i — 1, 
ici 
ses) №), then 9 
8* = 4,9-4,9... А, ... (5.5) 
ts also a sufficient subfield. 
Corollary 7: Let g(x) be an estimator based on x. Let $: and $, be two 
sufficient subfields and K a set common to Sı and Sa. Then for any convex (downwards) 
` loss function, an estimator uniformly better than g(x) is given by 
a BUGS] d weK т 
Higtz) 33] = e (6) 
Е 9(=)| So] otherwise, 
where S, = S,K-- ,K’. 
This result is useful in estimation problems when the improved estimator 
Е[9(+) | S,] is difficult to compute for all zc X. In such cases, this result may be utilized 
by employing subfield $, and another subfield S, such that E[g(x)|.$,] is simple to 
compute when v є К and E[g(z)|.S,] is simple to compute when хе К’ (K e СЯ 
# = 1, 2) and using (5.6) as the improved estimator. The resulting estimator 
will still be better than g(x) and in addition will be simple to compute. Further 
if S, Є S, this estimator will also be better than E{g(x)| So]. For completeness a 


straightforward generalisation of Theorem 5 to the case of countable number of suffi- 
cient subfields is given below. 


Theorem 6: Let {S} be a countable number of sufficient subfields. Let {K} 


be a sequence of mutually exclusive and exhaustive sets such that mes. (vy 1, 2,...). 
Then 


з* = È Ks, 
isl 


is а sufficient subfield. 


Proof: Since $; is a sufficient subfield, there exists for each Ac.S an 
S;-P-integrable function $; a Such that 


Рав А, Г] А] = | dp = J $,4(x)dp for А;є 3; (i — 1, ..., оо), pe P. ... (5.7) 
Ain A A; 


Now for any А* = X A;K,eS* and for each А є S, we have from (5.7) 
Вес A*A] = lim Pie e( È А,КОГ\А] 
moo t=1 


pes (2 js [ = баада, œ Jap 


= Hin р [ #Фщук,(®)]4р. e (з) 


>< A* 
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Since 0 «€ f, = ( $6. tn, w)< 1 a.e. (P), pon fa= fi exists ae. (P) We, 


thus, have by Lebesgue’s monotone convergence theorem 
Pix є А*Г\А] = L Fi(x)dp. S (5:9) 


Since for each т, f, is S*—P-integrable, ў j(v) is S*—P-integrable. Hence 
the theorem. F 

Tn the following section Theorem 5 is used in sampling with unequal pro- 
babilities to derive some simple improved estimators of Y. 


6. APPLICATION TO SAMPLING WITH UNEQUAL PROBABILITIES 


We have seen that 2, is uniformly better than 2. For n = 3, 2» can be expressed 
as 


Zw if v—1, 

1 YotYo] к y= 

z potat if v=2 2s (6.1) 
2, = ЗЯ Bn 

1 +. 

- аёаа] if v—3. 


It is not simple to compute z, when n > 3 owing to cumbersome computation of 
C's. However, if ina sample of size т, v—(n— 1), £ is expressible in the simple form 
(n-1) й 


$ 
MS уш pean { 2. (62) 


As a direct consequence of Corollary 7, it follows that a simple estimator 
uniformly better than 2} (and hence better than 2) is given by 


Z if »—(n—1) 
nf : 108) 
EM otherwise. 
ae ; Se и 
Two points in favour of utilising 2, are: (i) it is as simple as 7 or ё and. (ii) it 


is more efficient than 2}. , ; А 
Another simple improved estimator of Y. Another simple improved. estimator 
of Y can be derived by using the following sufficient statistic 
T, = Кеа» 2w) -o (ор 20] 
Ao if A2 
where Qu = 3 
ТУ. vua pU erp Tv). 
and Ло is the number of times t 18 included in the sample. ist 
Assuming without апу loss of generality that c) = 1ifi=—1,...,&, and 


Е +. 
aw > НЕВЫ... v and Ж Aw = m, it can be shown that an 
{=1 


(6.4) 


estimator better 


than 2 is given by 


4 1 k 
25 = ИЕ =— aza + È dara | e (6:0) 
Fry = EET. [2 at 2 tur j| 
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where 4 = БҮЛӨЛҮҮ summations X, and Xj have been defined in (2.3) 
Zy Pa +» Pint) 
and are taken over pq, ..., P 
In practical situations, it is much simpler to compute this estimator than 
to compute z,. For m—k = 1 and m—k = 2, TA iş given by 


fr Е 
1 в Z yo д 
= È a mat = if m—k=1 
o X Po 
Z =з i=1 
Ке) ... (6.6) 


1га 240, р) 1: ES 
Ec aal sl f. —k= 2 
L^ [2 9 Zot ig) ] m 


k 
where kly, р) = X Yu ро (yw) Pa) and Ар, p) is defined similarly. 
i=1 


In general when (m—k) is large, this estimator may also involve some extra 
computation. If the statistician is not even in favour of this extra computation, 
the author, as a consequence of Corollary 7, recommends the following improved 
procedure of estimation 

(i) use 2 if (m—k) m5 mes (6.7) 
(ii) use E[z|T,] if (m—k) < 2. 
For estimating the variance of these estimators, author suggests (4.12) as an 
estimator. 
7. CONCLUDING REMARK 

Tn case of large samples if one is interested in altogether dispensing with the 
extra computation, the observed sample of size n may be divided into sub-samples 
ОЁ sizes ж, ns, ..., n; etc., (En, = n and л, = 3, 4 or 5 etc.). This division should, 
however, be independent of sample observations. Each sub-sample may then be 
treated as a sample in itself апа simple improved estimators may be obtained for 
each sub-sample by using the estimators given in the preceding section. The over-all 
improved estimator can now be obtained by averaging the estimators obtained from 
each sub-sample with weights Proportional to the sub-sample sizes. 
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MISCELLANEOUS 


AN EXISTENCE THEOREM IN SAMPLING THEORY* 


By T. V. HANUMANTHA RAO 
Indian Statistical Institute 


SUMMARY. А (1,1) correspondence is established here, between sampling designs and sampling 
Schemes, for sampling from a finite population. This result enables us to search for optimum sampling 
procodures in any particular case, through a unified general set-up. Ў 


1. IwTRODUOTIONS 
Let a finite population, of N units be given by 
Uy, Us ..., Uis ..., Uy. S) 
We give the following definitions. 
Definition 1: А sample ‘s’ from the above population is an ordered sequence 
et. Ui eS Un; 1< #<М, fr 1£t&n, Юу 


where the is need not necessarily be distinct and л, is called the size of ће sample ‘s’. 
Definition 2: А sampling scheme is a process of selecting units one by one from 
the population (1.1) with pre-determined sets of probabilities of selection for individual units 
at each of the draws. 
Definition 3: А sample design D is an arbitrary collection ‘S’ of samples ‘s’ with an 
arbitrary probability measure P defined on it, according to which the samples should be 
drawn, We can write explicitly у 


D = р(8, Р), .. (1.3) 
where 25 == 1. 


It сап be seen that this is the most general definition of a sample design. 


It is known that any sampling scheme results in a unique sampling design, which 
is fully determined by it. That the converse also holds good, is shown in Section d We 
restrict our proof to the cases of practical importance when all the samples in the 
given design are of sizes «; m, a fixed positive integer. However, the extension to the 
cases where this condition is not satisfied, such as sequential procedures, should not offer 
much difficulty, i 

A sample design is said to be completely specified if all possible samples including 
their permutations together with their respective probabilities of шч are specified, 
Examples of partially specified designs are (i) specification of all possible unordered samples 

* Read before the Fortyseventh session of the Indian Science Congress, January 1960, Bombay. 
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with their respective probabilities of selection and (ii) specification of the probabilities of 
inclusion in the sample of each of the units. Corresponding to any partially specified sample 
design, there may be many possibly different completely specified sampling designs. 


2. MAIN RESULTS 


"Theorem : There is а one to one correspondence between completely specified sample 
designs and sampling schemes. 

Proof: We introduce a new unit U, called the null unit, into our population. The 
occurrence of U, in a draw means that none of the units belonging to (1.1) is selected in that 
particular draw. We replace any sample 


$= v. US cay e where я, < т, 
b; = d i 
№ 8 (0, Ur > Us Ug, Uo, ..., Uo), 


such that the size of s' is equal to m. We attach the same probability to s" as the correspond- 
ing 8. Let S'be the set of all з and Р’ be the probability measure on S’, as constructed 

above. 
For any sampling scheme, let ip | (55 ias «+.» în-1)}, denote the probability of selecting 

n 

i in the n-th draw, given that the first (n—1) draws resulted in the selections of U., U К, 
ty tg 
АН: 0, Buccessively, We shall now find p’s such that the resulting sample design is the 


given design D. It would be sufficient to consider instead, the design Ш’ = (S', P"). Then, 
let =й be the subset of S' consisting of all samples s' for which the first unitis U . Simi- 
ti 


larly, let ju» be the set of all samples which have U as their first unit and U, as their 
t t2 


2 
second unit, The definitions of 8; ж etc. are now similar. In all the above definitions, 
12273 
the indices й, de, ..., can be any integers not necessarily distinct, of the index set 0, 1, ..., N. 
Let us define р by 
; п 


D I 3 ^ ^ 
з= ХР. TUKAN. $089 
ti 
N 
Clearly, — X ро 1, sino () 8 =% and E P,=1. 
4 =1 p 4-1 i s'es’ г 


This defines the probabilities of selction of all units in the first draw. Let 


2u v P» 
es; $ if a (2.2) 
= 152 x i p? 20 es ^ 
(2 Ru — Я 
{oe i} = 4 a d 
0 otherwise 


fr 1| «á« N, and 0i <N. 
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Since g= y, 
LI 0 Ùh 


X 
it is clear that X {22 li) } =1 for 1 < и < М, 
їз =0 їз - : kc 


so that (2.2) completely defines the probabilities for selection in the second draw for all pos- 
sible outcomes of the first draw. We observe here that the null unit cannot be the first 
unit of a sample s' since s’ is the augmentation of a sample ‘s’ of the design D. 


Similarly, we define the probabilities of selection in the third draw by 


MM oF 

fe nai 

ae Rc PO 
ute 

0 A otherwise 


where the notation is clear. As before we have 
$ {Р хе, 16:8) =1 for 1<i, <N 
они i: 

0 S inis SN. 


This process can be continued until the m-th stage where it stops finally. This gives 
a well-defined sampling scheme giving rise to the design D' which is equivalent to the design 
D. That there is just one scheme giving rise to D’ is clear because at each stage all the condi- 
tional probabilities p™ | (i5; бз, «+ ил) Should agree for both the schemes. This completes 
in 
the proof of our assertion. 


Remark 1 : The introduction of null unit in the population ensures that at each 
draw, we deal with a probability measure on the population - f 


USUS SUR: 


in some cases where we come across а draw 


This removes undesirable ambiguity : 
т example, consider 


in which there is a positive probability of no unit getting selected. Fo 
the population of 3 units 


0,0, and Us 
Let S be the following set of samples 
(U); (Us Uy); (Uys Uo); (Us, Uo) and (Us, Us, Ua), with probabilities 1/5 attached to 
each sample. Then 


3 319-1859 
Lorin -5[5=3 
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and with a probability 4 we do not get any unit into the sample, which is an ambiguous situa- 
tion. However, introducing the null unit Ох, the sample (U,, Uo, Uo) carries а probability 
i 

4 во that 


3 3 |3 
Бе) - 3 [$= 


as it should be. 


Remark 2: We have seen above that for any general design there is a unique scheme 


which results in that design. We may call this scheme, “е generating scheme of D" and D 
the “generated design” of the scheme. However complieated a sampling design may be, 
we can always consider, conceptually at least, a scheme of drawing units one by one, which 
gives rise to the given design. Thus any sampling method which does not satisfy the defi- 
nition of sampling scheme as given in Section 1, can be treated as equivalent to a suitably 


chosen sampling scheme. 
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A NOTE ON MIXING PROCESSES 


By K. R. PARTHASARATHY 
Indiam Statistical Institute 
SUMMARY. In this paper it is shown that in the space of discrete stationary stochastic procosses 
under the weak topology finite Markov Chains are dense and the set of weakly mixing processes 
is a dense G5. 

The purpose of this note is to show that any real valued discrete stationary 
process can be approximated by means of strongly mixing Markov Chains and deduce 
that the set of weakly mixing processes is a set of the second category under the weak 
topology. This answers a question raised by Kolmogorov (1962). 


DEFINITIONS AND NOTATIONS 
Let В denote the real and E! the countable product of В over all the integers. 
T denotes the shift transformation. $ is the space of all distributions on R” which 
are invariant under T. flis assigned the weak topology which makes it a complete 
separable metric space. А 
THEOREMS 


Theorem 1: The set of strongly mixing Markov Chains is everywhere dense 
in M.. | 

Proof: Consider points of the type x such that T*x = 2 for some Ё. The 
smallest Ё for which this is valid is called the period of x, The measure which assigns 
mass 1/ to the points a, Tx, ..., Tz is a periodic ergodic measure [as described by 
the author (Parthasarathy, 1961)]. Such measures have been proved to be dense in 
fH (cf. Parthasarathy, 1961). From this we deduce the following. 

Consider sequences of the following type 

ж = (£y Zo Vp Ty ++ ) 


where the numbers zy, 2,,... tg are all distinct and 4, = %,, and measures with 
mass 1/Ё at x, Ta, ..., ТЕ и. Such measures are dense infi. 
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We shall now approximate measures of this type by strongly mixing Markov 
Chains. To this end, we consider the Markov Chains with states xp, Xi, ..., vy i, 
transition matrix 

e[k—1 1-е . elk—1l ... e|k—1 

e[k—1  e/k—1 1-е e/k—1 ... ejk—1 

1—є e/k—1  e[k—1 ... є[Ё—1... e/k—1 
and initial distribution (1/k, 1, ..., 1/0) for £y a, ..., % 1. These are strongly 
mixing Markov Chains. As e—>0 these Markov Chains converge weakly to the distri- 
bution with mass 1/ at the points ж, Tx, ..., 7-12. This proves Theorem 1. 

Corollary : In particular the set of weakly mixing distributions is everywhere 
dense. 

Theorem 2; The set of weakly mixing distributions is a set of the second category 
in M. 

Proof: Let fü, denote the set of weakly mixing distributions on АГ. It 
is enough to show that ffl, isa G,. We shall just indicate the proof since the arguments 
go exactly on the same lines as in the case of ergodic measures (Parthasarathy, 1961). 
We choose a metric in R such that the space of bounded uniformly continuous func- 
tions is separable under the uniform topology. Such a possibility is shown by 
Varadarajan (1958). A distribution д is weakly mixing if and only if, for every 
(real valued) bounded uniformly continuous function f on RI, 


lim ii ГЛИР") fG)du—( f d п) = o. 
п ә 0 + 


This condition can be replaced by 


lim LX | Г/Л") f) du—( f d py? = о. Se 
по № 1 


- Since “Xv is invariant under TXT and 


Z S Аа) fle) виа jl 


= a EE Ur) Bp) Eye) Bou) Epp s. 2) 


the limit as »—^oo of the expression (2) exists for every stationary distribution. There- 

fore in condition (1) we can replace lim by lim inf. Hereafter the proof is 
п со n— co 

exactly the same as in the case of ergodic distributions. 


REFERENCES 
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SOME RESULTS ON UNBIASED ESTIMATION 


By ROBERT V. HOGG 
. and 
ALLEN T. CRAIG 
University of Iowa 


SUMMARY. This paper is concerned with the existence of (i) unbiased statistics for certain 
parameters, and (ii) efficient statistics for certain рарып оаа when the underlying probability density 
function has a given exponential form. 


1. INTRODUCTION 

The problem of point estimation of parameters of distributions of probability has 
long interested mathematical statisticians, and the literature contains the important and 
interesting theorems of many investigators. In this paper, we wish to call attention to 
some additional results, which, in so far as we are aware, have not yet been pointed out. A 
particularly interesting theorem is one that is concerned with the distribution of joint 
efficient estimates of several parameters. 

Thr oughout this paper, probability density function is abbreviated p.d.f.; and den- 
sity means density with respect to Lebesgue measure. Й 


2. UNBIASED ESTIMATION IN REGULAR OASES 

Let X,, X, ..., X, denote the items of a random sample from a non-degenerate 
distribution which has а p.d.f. f(x; 0) that depends on the single parameter 0, cQ, where 
© contains a non-degenerate interval. We take f(x; 0) to be of the exponential form 

f; 0) = exp [OK (x) +S(@)+q(A)], а <= < by, 
zero elsewhere, where a, and bg, finité or infinite, do not depend upon 6, and K'(x) = 2,К(*) 
is continuous and not identically equal to zero. It is well known that E = K(X;) is 
a sufficient statistic for 0 and that Y has the p.d.f. 
exp[fyJ-T(y))-»q0),  a&y&b 
АД { 0, elsewhere. 

Again, a and 6, finite or infinite, do not depend upon 0. We then have the following theorem 
that gives, provided they exist, the unique unbiased statistics, based on Y, for 0", m = 1, 2, 
3,.... The conclusion of this theorem is the same as the conclusion of a theorem of Washio- 
Morimoto-Ikeda (1956). However, the hypotheses of the two theorems are different, We 
believe it is much easier to verify, for a specified g(y; 0), the hypotheses of the following 
theorem than it is to verify the hypotheses of the earlier theorem. 

Theorem 1: Given that the preceding conditions on f(a; 0) and gu 0) are satisfied, Let 
‘the derivatives Dkfexp T(y)}, В = 0,1,..., m, exist (а.е.Р). Let 

k. Y 
т, = cay Pile AD 
and let E[W;] exist for k = 1, 2,...,m. A necessary and sufficient condition that Wio Warm 
be the unbiased statistics for 0, 6°, ..., 0” is that 
[fexp (0 y)} РМехрі7 (уа = Кехр( у) Ріехр) TUI- 
for k = 0,1, ...,m—1. 
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Proof of sufficiency: The proof is by mathematical induction. If = 1, we have 


Рдехр T | _ 4 BS 
Е [ exp ZI | J = Риехр [7(0))) exp [y+ na(A)ldy 


If we integrate by parts, the right member of this equation may be written 


[—exp(T(y) -0y--nq(6))]h--0 ] exp [0y-- T(y)--nq(0) dy. 


The condition stated in the theorem implies that this last expression is equal to 0. For the 
general k, we have 


b 
E [ Eg J C- D*Ditexp [7 exp (0y+-ng(0)]dy. 


The right member of this equation may be written 
[IDFA exp [Т(у)]} exp (0y--nq(0))5 
+0] C- 29573 exp ИУ) exp [By-+nq(6)ldy. "E 
The condition stated in the theorem requires that this is equal to 
k-1 
0E [te 2 Coe m], 
We now invoke the induction hypothesis. Then the last expression may be written 


0(0*3) = 6, Completeness (Lehmann and Scheffé, 1955) of the family (g(y; 0); 060) insures 
uniqueness. This completes the proof of the sufficiency of the condition. 


Proof of necessity : If expression (2.1) is equal to 0 and if the integral in this 
expression is equal to 08-1, for k = 1, 2, 24, We have that 
exp [0 y]Dj- ( exp [7(y)]) 
has the same value at y — a as it does at y =b. This establishes Theorem 1. 
We note from Theorem 1, provided our conditions are satisfied with т > 1, that 


—1) РИ exp [T(Y)] А 
Е | D ЕРИП | E[—T'(Y)] = 0. 


Moreover, with m > 2, 


e = |-и РОО] = ктүн т", 


and accordingly the variance of —7"(Y) is 3 
var {—2"(Y)} = Е{—Т'(Ү)}#]—@% = —Е[Т”(Ү)]. 


Tn Section 3 we discuss when, and only when, this variance is equal to the Rao-Cramér 
lower bound. 


р Tt is easily seen, from the proof of Theorem 1, that if appropriate changes are made 
in the hypotheses and if f(x; 0) is of the form 


exp [p()K(x)+S(x)+-9(9)], ay <2 < by, 


th уу ЭЙ exp [PO _ 
en в [i-r РИ = ТҮ — py. 
Further, 


› Е же use anti-derivatives of exp[7'(y)] instead of derivatives, we can, given hypo- 
theses similar to those of Theorem 1, find unique unbiased statistics for [p(0)]-*. 
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The following illustrative example points up the importance of the hypotheses of 
Theorem 1. Let X have the p.d.f. f(z; 0) = exp[@x-Ina(1—z)+9(9)], 0 & v< l, zero 
elsewhere, where 0.52 0 and ‘ 


03 
exp Л = Gro By expo]: 
Consider a random sample of size one and let Y = X. Then g(y; 9) = f(y; 0) and T(y) 
—ny(l—y). It is easily verified that E[—T'(Y)] = Е[(2У—1)/1(1—1)] = 90. To 
compute, in accordance with Theorem 1, the variance of —7"(Y), we would need to be able 
to estimate 62. But the conditions of the theorem are satisfied only for m = 1 and we cannot 
estimate 0?. As a matter of fact, in this instance, the variance of — T'(Y) does not exist. 


3. EFFIOIENT STATISTIOS : ONE PARAMETER CASE 

Tn this section we shall investigate the conditions under which the unbiased statistic 
for 0, namely —7"(Y), is efficient; that is, when the variance of —T'(Y) attains the lower 
bound of the Rao-Cramér (1945, 1946a) inequality. Specifically, the following theorem 
will be proved. 

Theorem 2: Let Ху, Хэ, ..., Xy denote a random sample from a distribution that 

n 

has а p.d.f. which satisfies the conditions of Theorem 1 with т > 2. Le Y= LEX) and let 
— TY) denote the unbiased statistic for 0. A necessary and sufficient condition that the variance 
of —T'(Y) be equal to the Rao-Cramér lower bound is that K(X) have a non-degenerate normal 
distribution. 

Proof of necessity: Assume first that —Т'(Ү) is efficient. Since all regularity 
conditions (Cramér, 1946a) are satisfied, the variance of —7"(Y) is given by 


1 ps ue IT 

д ЗЕ. mg] -”g"(0) 
ЕЕЕ 
and also 21058) — aT) 


where k, does not depend upon y. With gly; 0) of the form under consideration, the latter 


condition becomes 

y-+ngq'(0) = Ы—Т'(у)—0]. Ey 
Since k, 52 0, T'(y) is the linear function {у — Ing (0) НЫ}. However, T') is T 
function of y alone; thus both &, and nq'(0)+k,0 are constants free of 0. Them since q(8) 
does not depend upon n, we have q'() = —ch-+d or 


q0) = —5 Hoe, 


where c, d and e are constants free i 8. Now the characteristic function of the distribution 


of K(X) is | 
Hjexp (itK(X))] = ехрі(0)—904-49] 
= exp[(c0—d)(it) —‹8/2]. 


n 
Thus, if — T"(Y) is an efficient statistic for 6, then K(X), and hence Y = z K(X;), has a normal 


distribution, 
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n + E . 
Proof of sufficiency: If K(X), and hence У = z К(Х]), has a normal distribution, 


we know that | " 
(=; 8) = exp[0K()-- S(x)—c6?#/2+d0+e] 
= exp{0[K(x)-+-d]+[S(x)--e]—cb?/2}. 


n 
Without loss of generality, we can replace [K(z)--d] by K(x) and y = E K(x;) -- nd] by 


у= È ка) to obtain 
HY; 0) = ехр{9 y--T(y) —пс0°/2). 
Since g(y; 0) is a normal p.d.f., we can differentiate 
То: 0) dy = 1 s. (8) 

-0 
with respect to 0 under the integral. We find that 

E(Y —nc60) = 0, 
or Y/(nc) is the function of Y that is the unbiased statistic for 0. From uniqueness, we 


know that —Т'(Ү) = Y/(nc). The variance of this statistic is found by differentiating 
(3.1) twice with respect to 0 to obtain 


E{(Y—ned)*—ne] = 0 ог var[Y/(nc)] = 1/(nc). 


But this variance is equal to the Rao-Cramér lower bound —1 I[ng^(8)] = 1/(nc). Thus 
—T'(Y) = Упс) is the efficient statistic for 0 and Theorem 2 is established. 


If in the preceding discussion 0 is replaced by р(0), we see that the variance of the 
unbiased statistic for p(0) is equal to the Rao-Cramér lower bound 


(a) ] ape 


if, and only if, K(X) has a non-degenerate normal distribution. 


4. EFFICIENT STATISTICS : MULTIPARAMETER CASE 


Let (Xij, X5,..., X5), j — 1,2,..., n denote a random sample from a non-degenerate 
p-variate distribution having p.d.f. of the form 


exp (E 0а, а) Stn, ns Xp) 0.0, 


where the domain of positive density does not depend upon (6,, ..., 0,)є О, where Q contains 
à non-degenerate p-dimensional interval. Let Kj, ..., Kp be such that the sufficient statistics 


Y; = REDE XS EAT CS. 
=1 
have a non-degenerate p-variate distribution with p.d.f. g(y,, ..., Vp; 94 +++) Өр) of the form 
окр (8 Beye Tas Le) nq, б/у. 


Under these conditions we can prove the following theorem. 
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Theorem 3: If the conditions stated above are satisfied, a necessary and sufficient 
condition that the unbiased statistics for бу, ...,05, based on Ү,;,... Yp, are joint efficient 


statistics is that Ki(Xp -- Xp) i =L p, have а non-degenerate p-variate normal 
distribution. А 


Proof of necessity: If U (Ys, «+. Yp) -+ ОУ, -> Ур) are the joint efficient un- 
biased statistics for 01, ..., 05, we have (Cramér, 1946b) that Е 


E = kalba) FH big Ty 0p), =, i D 


where the ky do not depend upon y;; «++, Yp- Under our conditions this becomes 


ит E: =È БП, бу 1 s De 


E 
Obviously, the left members of these p equations are linearly independent; thus also are the 
right members. Consequently the matrix К = (kj) of the coefficients is non-singular. 
Accordingly if y = (Yn «++, Yp) Ч = (Up ..., Uy), 8 = (0, + 6,)', and 

ones ( 90 ч , 


E 26," дб, 
we have from the above equations that 

U= Kay K? (n 44KO). 
Hence 0}, ..., Up are linear functions of yp «s Ур. Since Uy ..., Up are functions of 
Yı» ..., Yp alone, then K7, and thus K, and (вк) are matrices whose elements are 


constants that are free of Ө. That is, 


OY um 
ae c0--d, 


where с = (cy) and d = (dy «++» dy) are matrices whose elements are constants that are free 
of Ө. Here с = с’ since the second mixed partial derivative of q with respect to 0; and 4; 
does not depend upon the order of differentiation. Accordingly, 
q = —10'c0--0'd--e 
where e is a constant. The characteristic function of К’ = (Ку, = Kp) is then 
Е[ exp (it K)] = exp (it'(c0—d)—3t'ct; 
where Ё = (ty ..., fp) is real. This completes the proof of the necessity of the condition. 


Proof of sufficiency : Since Ky, «+, Kj have a non-degenerate p-variate normal distri- 
bution, we can assume, possibly after trivial substitutions, that к : 


901, «++ 05) = —10'40 
where Ө' = (0,, ..., 05) and A = (4%) is a positive definite real symmetric matrix of order 
р whose elements do not depend upon 0. If we differentiate 


Pope E Og E Tj s Yo) d dB s бу) |.» tf TT) 
-0 -- 1 
with respect to 0;, we obtain 
ôg —0, 1=1,2,..:; 
B (Yita x) , 2 m 


or I(Yj) = щацб+...-+®б). 
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That is, if Y" =(Y,,..., Yp) then 


Е(Ү) = nAé. 
Consequently, if U =(1/n)A“Y, where U' = (U,, ..., Up); 
then E(U) = (1/3) А-1Е(Ү) = Ө. 


That is, U provides the unbiased statistics, based on Y, for 0, If we consider the second 
partial derivatives of (4.1), we* have 


B| (уа x) (zin ном | : 


or в | (Yita ac) (Yin 3g) | =" PX... 


Thus the matrix of variances and covariances of Y is nA. Then the matrix of variances 
and covariances of U is 


(a>) ма (аз) даз 


The coefficients of the equation of the ellipsoid of greatest possible concentration 
(Cramér, 1946b) about 6 are 
9° та 94 
= | å ] EN Jr 
KR E 90; "95,00, — "9 


or B = »A. Since the inverse of the matrix of variances and covariances of U is equal 
to B, we see that U,, ..., Up are the joint efficient statistics for бу, ..., бр. This completes 
the proof of the theorem. 


Ап interesting observation is the following. Let us suppose that we are sampling 
from a k-variate distribution of the form assumed in Theorem 3. However, let Ё be less than 
P, where p is the number of parameters on which the distribution depends. If we follow the 
preceding arguments, we see that FAUX, XL. s. K (X,, ..., Xy) must have а p-variate 
non-degenerate normal distribution if joint efficient statistics for 6,, ..., 0p are to exist. How- 
ever, with k < р, this is impossible; so, in instances like this, joint efficient statistics do not 
exist. To be Specific, suppose a one variable distribution, of the form given, depends upon 
two parameters. Then joint efficient statistics for those two parameters do not exist. 
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A MINIMAL SUFFICIENT STATISTIC FOR A GENERAL 
CLASS OF DESIGNS . 


By DAVID L. WEEKS 
Oklahoma State University 
and 
FRANKLIN A. GRAYBILL 
Colorado State University 


1 SUMMARY. This paper derives a minimal sufficient statistic for a class of designs which 
includes the balanced incomplete block and the partially balanced incomplete block design with two 
associate classes as a subset. The derivation is given assuming a variance component model. (Hisenhart 
Model II). 


1. INTRODUOTION 


The usefulness of finding a set of minimal sufficient statistics is given when 
we consider a theorem proved by Rao-Blackwell, that a minimum variance unbiased 
estimate must be a function of a sufficient statistic for the family of densities under 
consideration, A minimal sufficient statistic (which always exists) for a family of 
densities is'desirable in that when this statistic has been obtained, we have all the 
"information" contained in the sample about the indexing parameter condensed as 
far as possible. : 


After obtaining such а set, the completeness of the distribution of the minimal 
sufficient statistic should be determined. If this distribution is complete, then the 
problem of minimum variance unbiased estimation is solved. If not, then some 
function of the minimal sufficient statistic must be used. Unfortunately, if there are 
two or more unbiased estimates of a function of the parameter which are functions of 


a minimal sufficient statistic, the Rao-Blackwell Theorem does not tell us which has 


minimum variance. 
This paper exhibits a minimal sufficient statistic for a rather general class of 
Eisenhart Model II (Eisenhart, 1947). The class 


designs under the assumption of an 
d incomplete block designs, and the 


of designs includes аз a subset the balance: 
partially-balanced incomplete block designs. 


The results for the group-divisible partially balanced incomplete block designs 


with two associate classes are given as an example of the application of the general 
Пу calculated in an ana- 


results. It is interesting to note that the quantities norma. 
lysis of variance do not include all the statistics in the minimal sufficient statistic 
for those designs. 
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2. MODEL AND ASSUMPTIONS 
We shall assume the matrix model 
Y = Хү+е 
where X = (X, Ху, Х,); ү = (и, Вт) with the dimensions of the matrices and 
partitions as follows : 

Y(nx 1); X(nx (5--t4-1)); y (b4-t2-1x 1); e(nx 1); w(x 1); B(x 1); 

(x13); Хох 1) where Хо = 1 forj = 1, 2, ..., п; Хох) 
where 21у = 0 ог 1, i= 1, 2, ...,b; j = 1, 2, ...,m; X,(n xt) 
where 2,,—0 or 1, i= 1,2,...,4; j= 1,2, ...y'n. 

The following distributional assumptions will be made: 

(1) eis distributed as the multivariate normal, mean vector «p, covariance 
matrix o°/,,; 

(2) В is distributed as the multivariate normal, mean vector «p, covariance 
matrix сї 1,; 

(3) т is distributed as the multivariate normal, mean vector «p, covariance 
matrix o27,; 

(4) и is a scalar constant; 

(5) соу (B, т) = Ф, cov (B, е) = Ф, cov (т, е) = Ф; 
where ф represents the null matrix (vector) and I, the px p identity matrix. We 
shall assume in this paper that: 

(1) rank (X) = 5--t—1 

(2) XiX, = М, 

(3) XX, = 01, 

Additional terms will be defined as follows : 

(1) N= X; X(N is the incidence matrix of the design). 

(2) A= X;—k3X,N'; 

(3) jg will denote a (px q) matriz with every element unity; 

(4) ji will denote (8X 1) vector with every element unity. 

(5) We shall adopt the following convention: If B is a symmetric bx b 
matrix of rank c and Р* is an orthogonal matrix which diagonalizes В, we shall denote 
the bx diagonal matrix Р” ВР* by Dg. A partition of P* which diagonalizes В 
so that the ¢ non-zero characteristic roots are on the diagonal will be denoted by 


Р'ВР = Dg. Non-starred diagonal matrices are non-singular with this convention 
and the characteristic vectors which yield this diagonal matrix will not be starred. 


(6) f=n—b-t+1, 
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3. CONSTRUCTING AN ORTHOGONAL TRANSFORMATION 


In this section we shall construct an orthogonal matrix so that an orthogonal 
transformation can be made on the vector Y in order to facilitate the determination 
of a set of sufficient statistics for this model. 


Since XX’ is a symmetric т Хт positive semi-definite matrix, there exists an 
orthogonal matrix Q* such that Q"XX'Q' = Бух, where Рух. is diagonal with 


b--t—1 positive elements on the main diagonal and n—b—t-+ 1 elements equal to zero. 


If we partition Q* into (0, Q5) we have 


Qi 


) XX'(0,0,) = s ) (X,X;--X,X;-X,X)0,09 


1 
e 


2 


Q 
Me ( 


2 


where Dxx' is (b+t—1)X(b+i—1) and rank (D--i—1). It can be shown that 
QY X, = 9; Qy X, = Ф; OF № Ф: Let О; be the last n—b—t+1 columns of & 


: р : " 
matrix P. For notation convenience let Q: = P. 


We shall now construct the first b+t—1 columns of.P and show that the 
resulting matrix is orthogonal. Let В be a (nxb--t—1) matrix partitioned n 
(R, R, R) where Ryp Re, R, are (nX1), (nxb—1) and (nXt—1) matrices 


respectively. 


Consider first the matrix A’A = (X, AN X) (X —E?X,N?). Since A'A 
is a symmetric tX matrix, there exists an orthogonal matrix Р} (say) such that 


PYA'AP} = DA 


where D^,, is diagonal with the characteristic roots of А’А on the main diagonal. 
Since the design was assumed connected, the rank of A'A 181—1. Hence one of the 
elements of the diagonal of Di. is zero. Corresponding to the unique charae- 
teristic root, zero, there is a unique (normalized) vector р such that р'А'Ар = 0. 
Since j; А’АЯ = 0, p = i? ji Partition and construct P; such that Р; = (РР) 
where P$, = р. Hence P,A'AP, = РАА where Daa is (0—1) X((—1) of full rank 
and all diagonal elements are positive. 
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Let us assume that there are s distinct positive roots of A’A denoted by 
dı, dz, ... d, of multiplicities m,, Mo, ....M,, respectively. Thus 


= m = 1—1. 
i=1 


Partition the (#—1) orthogonal vectors of P; such that P, = (Ps, Psp, ..., Р) where 
Ру; is (хт). Thus 


Ps 41», Ф E 
Т Рз Ф AT in, 1 
P;A'AP, = A'A(Ps Ps, ЗР) == . 
P, Ф Ф 2,1, 


Since (D74? Р.А’) (AP,D д?) ==; Dj P,A' is aset of t—1 orthogonal vectors 
of dimension п. Let the Rj portion of В (of P) be equal to Dy}? Р.А’; i.e., 
R; = Dj Pia’, 


Since A'A = rI,—k-NN' the txt orthogonal matrix P$ also diagonalizes 
NN" since NN’ = rkI, КАА, then 


PyNN'P; = Ру (rkI,—kA' A) P; = rkI, ВА 


2. rk Ф 
Thus PiNN'P;— Din = | ] 
Ф rkI;—-1—kDaa’ 


with P4NN'P, = "Ы, , kDa. 


Let us adopt the convention that the characteristic roots of A'A which are 


equal to r, be equal to d,. Thus the multiplieity of the zero characteristic roots of 
NN’ will be m;. 


The characteristic root rk of NN’ is unique and since p NN ‘e = rk, p is the 
characteristic vector of NN’ which corresponds to the root rk. Now 


Ру hk p Фф 
$ „р* Р 
IBI ГР Р) оо 
Ps, - Ф 9 Dyn 
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КЕП. MEE с eet 
where Dyn’ = 0 үй) 
[^ ОН 


Let kr—d;) = di; i = 2 ay t 


Now the non-zero characteristic roots of N'N are identical to the non-zero 
characteristic roots of NN’ and of the same multiplicity, Hence, letting Р; denote 
an orthogonal bx b matrix bes diagonalizes N'N, we may write 


rk Фф i Ф 
PPN'NPP—| Ф Pmim Ф 
Фф Ф Dyn’ 


where m; = b—t (note m; may be less than, greater than, or equal to zero). Partition 


Р; into (РРР) where Ps, Ру, and P, are gra dimension bx 1, (bx (m, 4-m;)) and 


s 
bx У т; respectively, where 


i22 


= P3 3 em Фф Ф 
Р.М МР: = | Ри |М№ММРьРР)=| Ф Фит Ф 
Р; : -Ф 9 Я DNN' 


g : га 
The matrix Dew Pi Nis a set of У mj orthogonal vectors of dimension 6 and 
= 


are the characteristic vectors which diagonalize N'N to give the non-zero charac- 
teristic roots of N'N excepting rk. Thus 


Di P{N(N'N)N'P, Оуу, = Шм 


nding to the non-zero characteristic roots of 


and the characteristic vectors согтевро: 
f the non-zero 


N'N can thus be expressed in terms of the characteristic vectors, 0: 
mo roots of NN’. - 25 


Corresponding to the unique characteristic root rk of N'N, is the charac- 


teristic vector 0—48 ў» since 5-1? ji N'N ji pie = rk. 
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Now we are ready to define the orthogonal matrix P. Let 


| 
| 
1 
| 
| 
| 


| Ri = пај 


kae рих; k2 ру X; 
В; = = —: ^ 4 
UC Li p; X; E^ Ds, Ps NX; 
»35ji 
kn P; X; 
Then p =| € DAN ВМХ, 
-1/2 ^ 4t 
D4^ P;A 


Р, 


Tt can be shown that the matrix P defined in this manner is orthogonal. 


4. А SET OF SUFFICIENT STATISTIOS FOR THE VARIANCE COMPONENTS 


The n x 1 vector Y is distributed as the multivariate normal with mean vector 
Хой = Fie = y (say) and covariance matrix X,Xioi-- X,X303-++0°F,, = X (say). 


The joint density of the elements of the vector Y is then 


7 1 E 
q(Y;0) = (тута || vs **P (—1/2(Y — y) ZA(Y — в} 
where 8! = (д, сї, 03, o?). 


Consider now 


91; Ө) = 


1 = = 
(тул риа oP (—1/20 —&)PP/Z3PP'(Y —) 


1 y А 
(отт jam P {-1/2(P'Y—P'p)'P’S1P(P'Y—P'u)} 


where P is the orthogonal matrix defined in the previous section. 


=t — З 
E ids form of P'z Р and P'(Y —u) are needed in order to be able to define а 
Seb of sufficient statistics for this family of density functions. 
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Since P'Z-!P = (P'ZP)-1, let us determine the form of P'ZiP and then invert 
the resulting matrix. From the manner in which P was defined we have 


P’=P = 
o?--kei--reà € 9 9 9 Ф 
Ф (02--601)1,, т Ф Ф Ф Ф 
Ф e I PHA Шо — € E Dy Doi Ф 
Ф Ф Ф (02-708), Ф Ф 
А „АА 
e e pemibos е =й ө 
Ф Ф Ф Ф Ф er 
С ОЯ Ф 
where p= X m; and Dua = 
i-2 . 
Ф d,In, 
More explicitly, we have : 
(1) (6?-- ko) E, +k Dyno; = 
(o?-+- ko? k-d305) Im. Ф 
Ф (0?--ko-- k-1d3o$)Em; 


Ф Ф (о?--03-- 0=14%0%)1, 


(02--40%)1, Ф З Ф 


Ф (оао: Ф 
(2) e*L,-- Duaci = | 


р (o?+-d,03)Im, 


Le Ф Ф 
E, dh Tm 03 Ф, B 
(3) AD, Dot = Ф Je In, % аг 
Ф 3 ЕЛ In, oi 
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ї ) 1 ў Н Ф | 
(7*4-d,0$)A;! In, 
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According to Neyman (1935) the 3s+1 statistics defined as follows : 
81 =9 
8, = k3AY'X,P5,P5 XY 
вз = r3Y'APS,P5;A'Y 
8 =У'Р.РУ 
85; = У'Х,Р„Р, ХҮ 
$4; = d; Y'AP P A'Y 
ЕЕ] 
"а. 


= Y'X,P,,P!,A’'Y 


Su = 


are a set of sufficient statistics for @ in this family of density functions. 


Since Py = (r—d,)P;, N we may alternatively write s; and s, as 
[k(r—dj)]3 Y'X,N'P4P;;NX;Y and kY "X, NP,,P$4,A'Y, respectively. 


4. MiINIMALITY OF THE SET OF SUFFICIENT STATISTICS 

We shall show in this section that the 38--1 statistics defined in Section 3 
form a minimal set of sufficient statistics for (У; 9). 

Lehmann and Scheffé (1950) have set forth a procedure by which a set of 
sufficient statistics may be shown to be minimal. In brief, their method involves 
proving that a function 

yz Җ(Ү,Ө) 

TOY) A) 
being independent of parameters implies S = S, where S is a vector-valued sufficient 
statistic for Ө in (У; 0) and is usually considered to be a “Proposed” minimal set. 
S, is obtained from (У o; 9) in the same way that S was obtained from q(Y; Ө). Sis an, 


"operation" on a density function which reduces the dimension of the space of the 
sufficient statistics. 


With these ideas in mind we proceed : 


KY, Yo) = I) = exp —H—w) 


where we have defined to be the orthogonal transformation P as defined in Section 
2 and 4 as defined in Section 3. 
Let 


w, = n[(5,—H)—(89—4)?] 
Wa = 83—82 
Us = 8у—8уу 


Wy = 84—80 


1055 = 855—850 t= 2,...,8 
Wei = 86—860 t= 2,...,8 
Wy; = 8—80 t= 2,...,8 
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Then we may write ехр— (9—40) as t 


ехр 1 ( Ў ја $ 
Bye. ( m LE E fits ) 


where fp = (r--hoi roi)? 
fa = (+ koi) 
fs = (02-703) 
ho? 


Soi = (02-403); 
fui = [0?4-ko1--k3(r—dj)os]A; * (i = 2, 3, ...8) 
fu = — 20347 
Now, exp — (9—9) will be independent of parameters if the exponent in 
К(У, У.) is identically zero for every value of the parameters. To be zero the w; 
and wj; defined above must be identically zero. These can be identically zero if 
and only if the f/'s are a set of linearly independent functions. 
In order to simplify the proof that these functions are linearly independent, 
consider the following transformation. 


Let ж = 0; у =o? koji; 2 = 02 фото. 
Тһеп А; = mt (у—х)(@—у) 
3 1 2 
and h= 2, h= TR 
ip A 
h= =й. z 
24-2 y) 
нет 


а D (ye) 


as (SA) 6-9 


i ma (y—2)(e—-y) 


= 2,3, 8) 


1 
+ (0—2) 
7 т те и 
а D 67 
A common denominator of these. 384-1 functions is 


ди) ПА = 0 ау). 
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=@—®(@—›), „_ Ф (r—d). 
Letting now w = ыо т ; B; r ` 
Е то fat ie Sig a= as 9 =1,2,..,h 
reU ds PROV a eoo e 
і, Ap 


we may write the 3s+1 functions as follows : 


Gf, = улг -Цж-—у) = wr, 

б = эчи+4—у) È whey 

Go yore ч, 

Gf, = yet oy) E wt, 

Ofu = orn py o) She, 
caer epe yp PB tas E (ey) 

u = ак ы ж 
caeteros yy È WEH E Ae yii] 


бн = ett Ges). goa) ан, 


"m To show that these 3s+1 functions are linearly independent, we must show 
в ET 


4 7 s 
= fap, -Wwa = 
i Foote бей р 


for every æ, y, z, implies w? = 0, i = 1, 2,54; ин —0,—5,6,7; i= 2, 3,...,8. 
We shall prove this is the case by equating coefficients of like terms on both 
sides of the identity above. 
Since the terms yutttg-tys-1, we, yet, and yiyr-lznor-1 occur 
only in f, fo Л, and f, respectively, these four functions are mutually linearly 
independent and are linearly independent of Ísis Ль and fy, (i= 2, 3,..., з). Hence 


we must now show that the 3(s—1) functions fy, Гу, and fa @=2,...,8) are 
mutually linearly independent. 3 
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Let 
Ф, = rut 
Vy = @—у)* 
Ур = yor 
a, = 0 
and ^ Bek = 0j. 


Then 


E 
fu = иеа) | E OLE Vah | 


BRE а Aes 
Ja = yor te oes) | ® МАУ ”® | 


1',, 
Jui = yat ie ea) = zh у, ox 


Choose the 3(s—1) terms Ф, Di s Dr-o Wor Vp «o Vice Yo Yar o 0er 
If we find the coefficients of these terms and equate these to what they are equal to in 
the identity and arrange them in matrix form we will have the system : 


Пе д 


ги га ао 


E из s н, бї ө... a, 0 0 SW. t W5 (4-1) 
о 0 аа 
0 1090 83 ài vee P. а] 1 э, rn Wee 
0 шо ба px uo Bo I viu-n 
0 0 =th - 0 = ths 0 [UA Ө, Why 
1 1 1 M 
о v Hue E den 
0 0 l ge = pe = Луд 0 0 Wrist) 
r — T5 
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Let H = (5), К = (05, М = (2), №, = {шк}. The above system can be written 
then as : 


H K Ф W; Ф 
Ф MC Wa. 8 Ф 
Ф E н ф W, Ф 


If the determinant of the matrix on the left is non-zero, this implies the only solution 
{о the above system is 


W, 
Tt has been shown [Graybill and Hultquist (1961)], that | Н | 5 0. The absolute value of 


the determinant of the above system is then i |9252 0. Hence the only solution is 


W, 9 
W, DH EA d 
и’, 2 Ф - 


and the funotions are linearly. independent. 
This then implies the minimality of the set of sufficient statistics set forth 
in Section 3. 
5. AN APPLICATION TO GD-PBIB’S 


In this section we shall apply the results of the previous sections to the group- 
divisible partially balanced incomplete block designs with two associate classes [ Bose, 
Clatworthy and Shrikhande, (1953)]. 


The characteristic roots for the matrix 4’A for the singular, semi-regular, 
and regular GD-PBIB’s with multiplicities of the roots are as follows. 
eee 


multiplicities 
1  m—1 m(n—1) 
singular о Ak PAS 
semi-regular Ü or kX(rk—r--A,) 
regular 0 Л k-X(rk—r--A,) 
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Applying these results to the results of the previous sections, we have 


singular 


dy = r, m = m(n—1); dy = = ит = m—1 


semi-regular d, = r, m, = m—l; d, = i (rk—r--14), m, = m(n—1) 


regular 


а, = 


М my = т—1; = (rb rA) ть та) 


In the regular GD-PBIB, we have no roots of A'A equal to r. 


We then have the following sets of minimal sufficient statistics for the three 
designs considered here : 


(A) singular 

(1) ў 
(2) k3Y'X,P;4P;XjV if b > m, and is not defined if b = m 
(3 Y'AP; PAY х 
(4) ҮР,РҮ 
(5) Lk3Y'X,P,P;,X.V 

k р 

k y ү 
(бу = к: Y'AP „РА 

Ар? X Y'X,NP4P5,A'Y 
(7) aw УХ.РЫРЬАУ or TANF ohn 
Pd 
(B) semi-regular 
a) g 
(2) КЗҮ'Х,РЬРИХ\Ү if b > t—m+1 and is not defined 
ifb =t—m-+l1. 

(3) Y'AP; PAY 
(4) YPPY 
(b) КЗҮ'Х,РЬРЬХҮҮ 
в) ETEN y ap, PAY 

луу Е? Y'X;NP,,P,,A'Y 
(7) es Y'X,P4,P5,A'Y or ышым 
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(C) regular 
Я | 
(2) ИХРЕРЕХУ if b > t and is not defined if b = 1 


(3) Y'P,P.Y 
(4) kAY'X,P,P;,X;Y 
(5) L3Y'X,P,P4,X,Y 


k 7 ^ ГА 

(6) gee AP,,P5,A'Y 

(7) E Y'APSQPAAY 
rk—r—A, bad 


в) CE ур р, дү ог LXY'X,NP,P2AY 
Е 1^ 22^ 82 
(9) COAT Y'X,P4P4A'Y ог kOY'X,NP,P4SA'Y 
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ON UNIQUENESS AND MAXIMA OF THE ROOTS OF LIKELI-HOOD . 
EQUATIONS UNDER TRUNCATED AND CENSORED 
SAMPLING FROM NORMAL POPULATIONS : 


By S. А. D. С. DOSS 
University of Poona, India à 


SUMMARY. It is shown that-when estimating from truncated and censored samples both the 
paramoters of normal populations jointly or only one of them the other bsing known, the maximum likeli- 
hood ostimating equations possess a unique solution and that solution provides the absolute maximum of 
the likelihood function for all samples of any size. ` 


1. INTRODUOTION 


The problem of estimating the mean, д, and the standard deviation, с, of normal 
populations from truncated and censored samples has been considered by Cohen (1950, 1959), 
Gupta (1952), Hald (1949), and by many others. These authors have presented maximum 
likelihood estimating equations which are often solvable only by some iterative process. 
It has been tacitly assumed by them that their system of equations have a solution. 
The purpose of the present paper is to show that their system of equations possess а unique 
solution and that the solution maximises the likelihood function absolutely for all samples 
ofany size, However, three different cases of the problem of estimation of the parameters 
of normal populations are considered, viz., (Т) и alone is estimated с being known; (II) alone 
is estimated д being known; and (ILI) both и and с being unknown they are estimated jointly. 
"Throughout the paper by truncated and censored samples we mean doubly truncated and 
doubly censored samples. : : 


Huzurbazar (1948) presented certain general necessary and sufficient condition for 
the existence of a solution of the likelihood equation for every sample of any size in the qase 
of uniparametrie distributions belonging to Pitman-Koopman's class. In Beatdon 2, the 
condition given by Huzurbazar has been generalised to multiparametric distributions belong: 
ing to the same class. Since distributions belonging to Pitman-Koopman’s class retain their 
property of possessing sufficient statistics even after truncation (see Pitman (1936) and 
Tukey (1949)), and since normal distribution is а member of this class, in the case of truncated 


Samples to prove existence of a solution we only show that the condition given by 


. Huzurbazar (1948) is satisfied in cases (I) and (II) and its generalisation is satisfied in oase 
(Ш). When once these conditions are satisfied uniqueness and maxima of the solution 
easily follow from Huzurbazar’s (1948, 1949) result that in the case of distributions admitting 
sufficient statistics the likelihood equations have a unique solution for every sample Е апу 
size and the solution does make the likelihood function a maximum. Incidentally, it may 
be remarked that in the light of the results of Huzurbazar (1949) and Tukey (1949), the 
results of Des Raj (1954) presented in his Section 5 need no proof. 


We consider independent random observations from a normal population with pro- 
bability density function (PDF) 
1 -(n-u?e? an Фасо 
= —— ё П " 
e) P 
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If za and zg(zg > Xa) are the points of truncation, we have a truncated sample when all 
record of observations below z, and above zg of a sample are omitted. On the other hand, 
we have a censored sample when the number of observations of a sample falling in the regions 
(—00, Ta) and (xg, со) are noted though their individual values are not measured. Throughout 
we shall adopt following notations : 


ti 
EA sa Ell. 5 - dt. 
«= SOEs л =й 


2. DOUBLY TRUNOATED SAMPLES 

Before showing that the necessary and sufficient condition given by Huzurbazar 
(1948) for the existence of a solution ofthe likelihood equation for all samples of any size is 
satisfied in cases (I) and (II), we shall generalise his condition for multiparametric distribu- 
tions. In the uniparametric case the PDF of a distribution of the Pitman-Koopman's class 
is given as (see Pitman (1936)) 

Ја, 0) = exp {uy(A)r(x) + A(z)--.B(0)), 

where u, and В are functions of , and v, and А are functions of æ. In this case Huzurbazar's 


necessary and sufficient condition for the existence of a solution of the likelihood equation 
for every sample of any size is that the functions v,(z) and 220) | E should have the 


same range of values. In the multiparametric case the PDF of a distribution of the Pitman- 
Koopman’s class is of the form (see Koopman (1936)) 


Де, 6) = exp { È £e) Ae) + BO}, 


where 0; stands for (6, 02, ..., 0p) the parameters of the distribution. Now, the necessary 
and sufficient condition for the system of likelihood equations to have a solution for all samples 
of any size can be easily seen to be that the-two transformations defined by 


9u(5) ди.(6;) диб) | | ово) | 
vi(z) дб, Gags ri NN m 
Pul) ди. (6) ди, (0;) 98(6;) 
А and — 90, д6, zx 0, | 00, 
9u(8;) Qug(5,) ди,(0)) AB(O;) 
Na i^ LT NEN MN E 720, - 


: should have the same range. We shall, now, show that Huzurbazar's condition is satisfied 
in cases (T) and (II), and its generalisation in case (III). -We exclude the trivial cases where 
all the sample observations are equal to either x, or хв. 


Case (I): Estimation of ш alone с being known. Without loss of generality let us 


take o to be unity. Now the PDF of the population from which the samples are to be 
drawn is 
f'e и) = SEM e~ (@—u)?/2, t L UA, Dg. жу (2:1) 
Ут) Po 
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Let жу, 25... && be a sample of size n drawn from the above population. The likelihood 
funetion of the sample is : 


(д) = (пев) exp 1—2. Ee». (2.2) 


In this case | w(u) = д, 


vy(z) = 2, 


B(u) = — log(6g—9,)— im d (2:8) 


@в—@ É 
Фв- Ф, 


апа 


_ дВ(д) | ди(и) _ y — 


ди ди 
By Cauchy's mean value theorem we have 


Qs—Q. e s (24) 
Фв—Ф. 


where К ta <Ë < tg. 


Hence — Es [240 RE =a | ... (25) 


Ви) [дш 
where хо < z' < xg. Thus we see that both v,(z) and — 0 | zd have the same range 


(Ta тв) and hence the likelihood equation aoet 0 0 has a unique solution that 


maximises the likelihood function Lp() for all samples of any size. 


Case (II): Estimation of с alone p being known. In this case, without loss of 
generality we shall take to be zero. Then the PDF of the population from which, now, 


the samples are to be drawn is 


PEST a А ЖЕП 
fie) TE 8] 


Let ту, ху, «+, з, be a sample of size n drawn from a normal 
The likelihood function of the sample is 


e7720, Za L T STe .. (2.6) 


population with PDF (2.6). 


И рее ре ву s (2.7) 
Jala Cosa e» [3 À } 
EP ; qat 
This time, ш(0) = —353" 
vy) = 2*, (2.8) 


В(с) = — (log o+Hlog (Фв—Ф„)}, 


à Bi дио) _ 1808—10 Y. 
ana = Be) |242 = t ии} 
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Again by Cauchy's mean value theorem we have 


кы =1-#", ... (2,9) | 
в— | 
Hence, p A [Re = git 2", ... (2.10) 


Thus we see that v(x) and — Fe) мо) have the same range, and hence in this case 


also the likelihood equation 2198100) = 0 possesses a unique solution and this solution 


provides an absolute maximum of Ly(o) for all samples of any size. 


Case (III): Joint estimation of и and с. In this case, the PDF of the population 


Seimo) = 5 КИ соса, es (2.11) 


and the likelihood function of the sample їз 


1 я 9 1€ 
Тиц, 0) = (на) ехр {- эз xr (z;— y)’ }. ... (2.12) 
Now, м (м, €) = -r 
из(и, т) = pjo, 
s z3 
| из " [ ] ; 
Bo. с) = -{4 B log c--log (®,— ®.)}, (228) 
and х 
au Pm D [o7] (n2) в (+=) е. | 
ди ди ди ао ] 1— и Pie = 
m р x B— 
Ou ди ôB ой 
и 


Lu 


(u4-t'o)t 23 
TAS ]- | 39. « m, Xs. 
д 2" 


Thus we see that the generalised condition of Huzurbazar is satisfied in this case. Hence 


1 
the likelihood equations Pg irte) _ =0а ра 28 Ll 2) =0 have a unique solution 


for all samples of any size, e solution maximises d likelihood function absolutely. 
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3. DOUBLY CENSORED SAMPLES : cass (1) 


A doubly censored sample of size № = r+n+s be drawn from a normal population 
with mean д and standard deviation unity; there being 7 observations in the left truncated 
region and 8 observations in the right truncated region. The likelihood function of such a 
sample is 


LA) = Cy, Фу exp [- È et) FE 81) 


where C, is a constant. Hence, the maximum likelihood estimating equation for д is 


эке Idt) = r. +». Qui È (p) 0. a (8.2) 
x: ди l—Ọg іл ` 
. dlog І, 
To know whether (3.2) has a solution let us see which sign эв чн) has for 
extreme values of д. 1% can be seen that 
5 ô log Ги) (3.3) 
lim = 00. rm 
n>- 00 др 
On the other bande lim 9108 4и) _ o. . s. (84) 
109 ди 


Because гв 4p) is continuous for all д, it follows from (3.3) and (3.4) that the likelihood 
m 
equation юв Ta) = 0 has at least one solution in the range (—оо, со) of p. 
ди 
Now we shall show that the likelihood equation (3.2) has опе and only one solution 


and the likelihood function L(x) Ваз absolute maximum at that point for all samples of any 
size. Consider the second order derivative of log L,(u)wrt 1, 


Pog IUM) — [nt riQat he) 9 ЕЩЕ e 05) 
Since 99.10.) 29.70, 
and ; А = 0, 
it follows that “Фив, > 0 for all ty 2 8) 
Again, since 29—608) = —(1—65) « 9, 
and : mU. (0-9) = 0, 
it also follows that Qp—to(1—@p) > 0, for all tp. 2 87) 
Hence А <0 гайд. To TER 
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From this fact it follows that all the solutions of (3.2) provide maxima of the likelihood 
function. But, since log (и) is continuous, between two maxima there should be a 
minimum for which — 


a log Lelu) |. ... (8.9) 
on? 
This contradicts (3.8). Hence, the likelihood equation (3.2) has a unique solution and that 
solution provides the absolute maximum of the likelihood function (д) for all samples of 
any size. 
4. DOUBLY CENSORED SAMPLES : CASE (П) 


Now, a censored sample of вме N = r--n--3 be drawn from a normal population 
with mean zero and standard deviation v. The likelihood function of the sample is 


Lo) = бу. Фф. (1—Ф»)' [ij exp { -h i af } ZEME 
and the likelihood equation for c is · 
8lgL(o 1[ , , tQ, [29 1$ m 
==) = | nr +. eta at ]=0. e. (42) 


Let us first see which sign 2 rd do) has for extreme values of g, in order to know 


whether (4.2) has a solution. We can easily see that 


lim. ¢ 910g Lio)... o; 2. (4.3) 
o> 0 до 

and lim с 9108 140) _ , <9, 2. (44) 
с 0 дс 

Since Sog den is а continuous function of с, from (4.3) and (4.4) we have that the 


likelihood equation (4.2) has at least one solution in the range (0, oo) of c. 


To establish uniqueness and maximum of the solution consider the second order 
derivative of log Lo) wrt. o. We have 


92 1 
og hie) E V, о эшш) ү, © ео.) 
ф-т 5e ae 3]. (45) 
For all the solutions of (4.2), 
02 log І, 
= [чении Betola ey, А a] < ©. 


(4.6) 
As in Section 3 we can easily see that this leads to contradiction if (4.2) has more than one 
solution. Hence, the likelihood equation с) а) = 0 has one and only опе solution 


for all samples of any size, and this solution does e the likelihood function (т) maximum. 
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5. DOUBLY CENSORED SAMPLES : OASE (I) 


In this case, a doubly censored sample of size N = r+n+s is drawn from a normal 
population with mean д and standard deviation о. The likelihood function of the sample is 


Lely 0) = 01.9 (1—0 (1 ) exp [d эрт E eu?) ST 


Now, for all the solutions of the likelihood equations аис) = 0 


апа Eo 9 =0 
we have mx 2 ch = a,+ag+n, 
log Lem, о) _ DSi ... (5.2 
-9 а) = аи + $ (8—00) (5.2) 


2 
апа o Zog e) — гр, о) _ 


Das - я ++ i (z;—p)*, 


where a, т(@«--%Фы). a В (5-3) 


ap = 490—019) q 35, 


7 _@log Ди.) _0 log Lu, 0) ` 
иес S дида 


Hence g? 


. Flog Lip, с) 0 ш Lp, в) 
диде 

Qa tag ta аа +tg ав "M 

tala +tp ав A 


i=l LÁ = и)? 0 n 


Vag 
© Мав мав. ів Мав 


d Vig. te A 
tp d х x 
ursi | Dave] 
А з ... (54) 
+2, i + (54 
с 
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The right hand sideof (5.4)isasum of Gram matrices; and the rank of first matrix is 2 and that 
of other matrices is unity. Hence, it follows that the matrix on the left side of (5.4) is positive 
0 log Tali, о) 


definite. Therefore it is evident that a solution of the likelihood equations ди i 
2j COE Dm) = 0 is unique, and the likelihood function (д, т) is maximum at this 


дс 
point for every sample of апу size. 


In conclusion the author wishes to express his gratitude to Dr. V. S. Huzurbazar 
for his constant encouragement and guidance. Не wishes to thank his colleague Mr. P. S. 
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THE PROBLEM OF TESTING LINEAR HYPOTHESIS ABOUT 
POPULATION MEANS WHEN THE POPULATION 
VARIANCES АВЕ NOT EQUAL AND М-ТЕЗТ 


By SAIBAL BANERJEE 
Indian Statistical Institute 


SUMMARY. Given k independent samples of ni units from k populations N (mi, of) (i—1, 
.., k) а test statistic for testing a hypothesis Но about s (s& k) linear functions of k population means 
without any a priori knowledge of population variances or the ratio of the variances is of interest. A 
new test statistic called M statistic is defined for testing such hypothesis where any prior knowledge about 
tho population variances is not available. The error of the first kind (probability of rejection of the hypo- 
thesis when true) of the test statistic depends on the unknown population variances but the test statistic 
is во defined that for all possible values of population variances the error of the first kind is less than or 
equal to some pro-assigned probability а. It is shown that critical values of the test statistic for testing 
a hypothesis about two linear functions of k population means with а = 0.05, 0.02, 0.01, ete., can all be 
obtained from tabulated values of F-table. А numerical example for testing equality of three population 
means has been considered. It is also shown that the test statistic can be used in multivariate problems 
as wel. An analysis of Barnard's data (Barnard, 1935) has been considered. 


\ 1. INTRODUCTION 


1.1. Given k samples of т; units from k normal populations N,(m,, оў) (i = 
1, 2, ..., k) having equal variances or the ratio of the variances known a priori any 


k . 
hypothesis about any linear function У ст, of population means (where с; (i = 
1 


1, 2, ..., k) are known coefficients) can be tested by the t-statistic. Also, any hypo- 
thesis about more than one linear function of population means can be tested by 
F-statistic or F-ratio. If the population variances are not equal or the ratio of the 
variances are not known a priori it is possible to test (Banerjee, 1961) any hypothesis 
about any single linear function of population means. Also, any hypothesis about 
more than one linear function of population means can be tested by a new statistic 


hereinafter called M-statistic or M-ratio. 


2. SAMPLES FROM HETEROSOEDASTIO POPULATIONS 


2.1. Letz, 8 (i = 1, 2, ..., k) be sample estimates of population means and 
variances of k samples of n,-units drawn drom k normal population N;(m, о?) (i = 
1,2,..., k). Suppose it is required to test the hypothesis that 


£314 - ismat. HOM = М; 5 30:11) 
H, Camy eamat.. Herm, = Ма s (2.1.2) 
сату сата. от = М, .. (218) 
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where cj; (i M NES 8; j = 1, 2, ..., k) and М, (j = 1, 2, ..., 8) are known constants. 
Tt is assumed without any loss of generality that the relations (2.1.1), (2.1.2), ... (2.1.3) 
are mutually consistent and independent. It is also assumed that s < k for if s =k 
the relations (2.1.1), (2.1.2), ... (2.1.3) can be replaced by 

m = М; ((—1,3,..,k) 
апа H, can be tested by the statistic - 


k 2,— M. 2 k 
MR xd 
я 1 | ЁЛ ] p^ 
whose percentage points, although not tabulated, can be evaluated as each |, (i = 
1, 2, ..., k) would be independently distributed as a Student's ¢-variate if the hypo- 
thesis be true. 
2.2. Let test variates U,, Us, ..., U, be defined as 


О; = X egg. (6 = 1, 2,..., 8). w (2.2.1) 
jel 


The test variates U,, Us, ..., U, as defined in (2:2.1) are stochastic variates jointly 
distributed in a multivariate normal form. 
2.3. Now let us consider the probability of the inequality 


а s sj 
У (0;—М) > XA,0,5. 
i=1 jet hy 
where C,, Co, ..., Cp are defined as 
G= ÈA G-1,2,.,5 
=1 ; 
and 4, (j = 1, 2, ..., Ё) are positive constants to be suitably determined in a manner 


as discussed later. 

24. Let Mi, M, ..., М, be respectively means of test variates U}, U,, ..., U, 
whereas by hypothesis Hy the means are MyM,..,M, Let variates u; (i = 
1,2,..., з) be defined to 

щ = О.М; (i= 1, 2, ...,8) ... (2.41) 
w; (i = 1, 2, ..., в) as defined in (2.4.1) follow a multivariate normal distribution with 
Zero mean with, say, disperson matrix R. Now consider a further transformation 
(Ferrar, 1953) to variates vi (i = 1,2,...,8) so that 


Li 3 
yah 
1 1 


(2.4.2) 
and uw! = Albi 
where - и is a row matrix (t, us, ..., u,), 
Ww is a transpose и е 
апа Ет! is а sxs matrix reciprocal to the dispersion matrix В. 


The transformed variates v; (t = 1,2, ..., 8) are independently normally distributed 
with zero mean and variance, say, g% (i = 1, 2, ..., 8). 
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2.5. Now by virtue of (2.4.1) and (2.4.2) ! 
8 8 О 
X (UM) = È (а М-М) = X dy = Ж (uj—d)® ns (26.1) 
1 


1 
Also, by virtue of (2.4.1) 


( where >й = У qi). + 


5 8 3 О 3 - t eit ke. 
>Ш) = Vu) = В) Я БЫ) = Bor. (2.5.2) 
2.6. From (2.5.1) and (2.5.2) the probability of the inequality: 


$ (U,-M)? > 4.0, 
Tiu eM 1 Tw 


Li Li k 
Saye 0-м #409 
1 __{=1 > j=1 


is equal to le i i] 
Xo? Ж V(U) ` У 
2 0 s WU) = V(U;) 
which is equal to 5 BR > 5 Ayo, % 
i ja y 


where 


Xi, are non-central y?-variates with 1d.f. (i = 1, 2,..., 8) 
Аў аге y?—variates with v df. (y = —1, (j = 1, 2, «5 k) 
f, and oj are positive weights defined as 


о? ч ЭЖ. 
5; = Ca miim. oup ыу ло и 
$a — opm 
If the hypothesis Hy is true x%;(i = 1, 2, ..., в) are, however, distributed as x?-variates 
with 1 d.f. 
2.7. The crux of the problem of having a test statistic for testing hypothesis 
H, based on test variates U; (i = 1, 2, ..., s) therefore boils down to finding positive 


constants А; (j = 1, 2, ..., k) so that 


г k 
prob [ $ бой > 3 4o] <a = @ла) 


where 4, (i = 1, 2, ..., з) and № (j = 1, 2, .:., k) are all independently distributed 
X*-variates with respectively 1 and v, (j = 1,2,..., k) df. and f, and о; are positive 
weights adding up to unity. First, it has, however, to be proved that it is at all pos- 
sible to find finite positive constants А; (= 1, 2, ..., k) so that given some pre-assigned 
о (2.8.1) would be satisfied. 
2.8. Theorem: Let U, @=12,...,8) be a set of stochastic variates (not 
necessarily independently distributed) which satisfy the relation that 
prob [U; < 0] < о (EEI 2, ..: 8). 
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Now if f; (i = 1, 2, ..., 8) be а set of arbitrary positive weights adding ир to unity (i.e. 
X B, —1), then. 
1 
prob [È AU «oa. e. (2.8.1) 


Proof: First, let us consider the case of only two variates —U, and U}. 
Now if д, and f, be two positive weights adding up to unity 
prob [frt 2и < 0] < prob [U, < 0]4- prob [U, < 0] < o, 4-2. 
Also, similarly it can be proved that 


Li 
prob È AU, < €) < а. ... (2.8.2) 
2.9. Now let U,(i = 1, 2, ..., 8) be defined as 
$ Ajo, у ВР ... (91) 
5-1 7 


where Xi, (i, 1.2....,8) and x$ (j= 1, 2, ..., k) are all independently distributed 
x*-variates with respectively 1 and v; (j = 1, 2, ..., k) d.f. and А, (j = 1, 2, ..., k) are 
100. a/s percentile point of Student’st-table of d.f. у; ( = 1, 2, ..., k) so that (Banerjee, 
1960) 


prob [U; < 0] < a/s. ... (2.9.2) 
From (2.8.1) and (2.8.2) it follows 
prob È Bi > Ў АУ] $ a. ses (2.918) 


3. STATEMENT OF THE STATISTIC 


| 3.1. Let M,,,-statistic (М after Mahalanobis) for testing hypothesis about 
в linear functions of population means without any a priori knowledge of population 
variances of size о (or with maximum value of error of the first kind æ) be defined as 


s 

® Aot 
EARP 
E Ayo, № 
9=1 У; 
эи. Xa (i =1,2,...,8) and xj (j= 1, 2,..., k) are independently distributed 
X-variates with respectively 1 and v (j = 1, 2, ..., k) df. and р; and a, (i = 1, 2, 


+18 J = 1, 2, ..., k) are а set of positive weights adding up to unity and A, (j = 
1, 2, ..., k) are irreducible positive constants which have been so determined so that 


Li 3 k 
prob | [$ fix > È 4S ] 


isless than orequaltoafor all possible values of Banda, (i = 1, 2, ..., 5; j—1, 2,..., №). 
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4. CRITICAL VALUES OF M-STATISTIC 


4.1. Let us consider the case of finding critical values of M-statistic for the 
case s = 2andany k. The problem of finding critical values of M,,, amounts to find- 
ing minimum possible numerical values of А; (j = 1, 2, ..., К) so that 


a | Е x 
prob [в > X4] XS E we (4.1.1) 
1 1 У; 
If P denotes the probability of the inequality 
k 
Зи > $ Ан ve (412) 
1 1 У; 


we have 


рТ лор оа T hotoni e Ja dd. баз) 
оо 0 i=l 0 


where (yj) ( = 1, 2) denotes frequency function of a gh-variate with 1d.f. (i = 1, 2) 
.,k) and 


ED A 
jai Py 
2(—Bixts)/Ba { 
4.2. Тһе integral 1 Му) | { hood) dy’, is an upward convex 


function of z (Courant, 1957) (details in Appendix А.1) so that 


È УД osi Bd m 


p мы | рай 
0 0 


> Ea Proof ү. Мн} ы фал) 
From (4.1.1), (4.1.2) and (4.1.3) it follows 
P «i др, (4.2.2) 
where - Tre 3, T ^ot) MENDES ] dx 
(4.2.3) 


= T ола Гоби ава 


ndo 


where е 
+pani 
and 1, = на, 
L3 Ке кеки T RC EE 
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F T алов логам ав 
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(4.3.1) 


where - = Вх d Pais 


F: 2,vjaf У; 


for variation in апа £, is always less than or equal to x, where F?2,vj.« is tabulated 
F-value of F-table corresponding to 100 æ percentage point and d.f. of greater mean 
square 2 and d.f. of smaller mean square (у; = 1,2). (Details in Appendix A.2). 


4.4. Also, for the case У; 28 and a = 0.05, 0.02, 0.01, etc., the integral 


© w T, 
! 1 Му) М) і Jod ах ах s. (44.1) 
where Т, = Адл 


for variation in f and f, is always less than or equal to æ, where Fija is tabulated 
F-value of F-table corresponding to 100 percentage point and d.f. of greater mean 
square 1 and d.f. of smaller mean square (у, 2 3). (Details in Appendix A.2). 


4.5. Numerical values of A, of M», test can thus be determined from tabu- 
lated values F-table. Table 1 below gives numerical values of А; of Ms, test of size 
0.05 and d.f. у) = 1,2,..., 20. The values have been taken from F-table. 


TABLE 1, NUMERICAL VALUES OF 
Aj OF Mx, TEST OF SIZE 0.05 


yj Aj У А; 
1 200.00 11 4.84 
2 19.00 12 4.78 
3 10.13 13 4.07 
4 7.71 14 4.60 
5 6.61 15 4.54 
в 5.99 16 4.49 
1 5.59 16 4.45 
үз: 5.32 18 4.41 
9 5.12 19 4.38 
10 4.96 20 4.35 


| 
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. 5. TESTING EQUALITY OF POPULATION MEANS : 
_ Given k samples from normal populations N,(m,, 03) to test the equality 
0—1 о ee linear functions L; (i=1,2,...,4—1) 


k E 

L, Xm Ur = 0.5 see (BLL 

1, SANI jx a (5.1.1) 
@ = 1, 2, ..., k—1). 


). If s? denotes estimate of population variance of the i-th population 


(5.1.2) 


dii j= 12,2 E) with suitable choice of A; (j = 1, 2, ..., k) and the 
rejected if the numerical value of My~1, аз defined in (5.1.2) 


6. NUMERICAL EXAMPLE 
samples from three populations supply the following estimates. 


TABLE 2 
population 
UOCE al ын copus 
AIC п п 
samplemeen — mj 15:0 > 200 — 100 
sample variance sg 18.0 5.5 20.0 
sample зе ni 3 11 21 


U, = 1-8) = vg (6-20 
EO Ve (25—20). 
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М ,,4:statistie of size .05 may be computed as 


pate S 
"T 3044 Ad ы] 


Slim; m 


1 
5 5-20 [25—20}2 


DI ОЛ ЕЗҮН бб ox 20 
5 [19.00 +4.96 x7? назо 20] 
225 25 
See 


$ [114.00-+4-2.48+4 4.14] 


where numerical values of А) = 1, 2, 3) have been taken from Table 1 above. Since 
M,,, is greater than unity any hypothesis about equality of means is rejected. 


7. THE CASE OF MULTIVARIATE POPULATION 
7.1. Let k samples of №, (i = 1, 2, ..., k) units be drawn from k, p-variate 
normal populations having dispersion matrices У, (i = 1, 2, ..., k) which are not neces- 
sarily equal. Letz; and m; denote sample mean and population mean of j-th character 
of i-th population. Also let s;; and оу denote sample and population variance of j-th 
character of i-th population. To test the hypothesis that 


k 
= | = М «7 eR: 
Eem =A (j=1, 2. p) (7.1.1) 
NM, p,-statistic may be defined as 


Ў {х су À; IE 


j=1_ 41 


/ ELS 
iei Nj 41 ash 


(7.1.2) 


with suitable choice of 4, ( = 1, 2, ..., k) depending upon the size of the test. It 
can be shown that М, py as defined in (7.1.2) is equal to 


EL: 


VIDE INE. A 
XA; È oy 
i=1 id ЫЛ 
where xi; (i = 1, 2, ..., p) and x} (i = 1,2, ..., k; j= 1, 2, ..., p) are independently 
distributed x*-variates, y, being distributed with 1 and y% being distributed with 


р 
N,—1 d.f. and fj; and wy are a set of positive weights adding up to unity i.e. à > В=1 


(1.1.3) 


k 
and X È oy = 1. 
i=l jal 
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8. FURTHER NUMERICAL EXAMPLE 
8.1. As an example of likely use of J/-statistic in multivariate problems let 


us consider Barnard's data on Egyptian skulls. Four measurements on four popu- 
lations are summarised as ; 


TABLE 3. MEAN VALUES OF FOUR CHARACTERS 


character 
I IV. А У 
“population I 133.582 98.308 50.835 133.000 
п 134.265 96.463 51.148 134.883 
ш 134.371 95.857 50.100 133.643 
IV 135.307 95.040 52.093 131.467 


with numbers of observations as N, = 91, №, = 162, N, = 70 апа N, = 75 and 
pooled corrected sum of squares of the four characters as (i) 9661.997, (ii) 9078.115, 
(11) 3938.320 and (iv) 8741.509. Let 2; and m,; denote sample mean and population 
mean of j-th character of the i-th population (i,j = 1, 2, 3, 4). Also let 5? and oj 
denote sample and population variances of the j-th character. (Here the dispersion 
matrices of the populations have been assumed to be equal.) To test the hypothesis 
that 
my = ту = ту = ту ` (j = 1,2, 3, 4). 
Let test variates Uy, (j = 1, 2,3,4; k= 1,2,3) be defined be 
1 


Un =. V @у—®ь} 
à EE P чи” (8.1.1) 
Оз = NZ {23 aj 


1 
Из = я {Fy -75—23 —2} 


(j= 1,2, 8, 4). 

On the basis of test variates Uy (j = 1, 2, 3, 4; k = 1, 2, 3) Mis statistio may be 
computed as к 

È Ug | 

i cH дә (8.1.2) 

HIA 

with suitable choice of А depending on the size of the test. Taking numerical vale 

of A equal to 3.86 (value taken from tabulated 5 p.c. point of F-table corresponding 

to у, = 1 and v, = 400) approximate numerical value of Mie, stotictio gomes out as 

1.49. Since numerical value of M;2,.-statistic exceeds unity the hypothesis cannot 


be accepted. 
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. Appendix АЛ 
Let 
в (z—8121)837 
F(a) = пели T^ „— 2,7! dæ, } a ... (АЛЛ) 
0 ò 
where _ BıHBa=1; Bi, 89 > 0 and В, > £i. 
We have = 
CUTE ES TI CT eee ЭЩ mes 
d rj f. а uc c Go mana a [schen] I | 3 
=K. Í Еа БВ) [2] dy = 1+1, ss (Ал) 
0 
where 11 = К. a 7l Posi eee) [eia (2—2) —# Т 
= аке ©“: (ema) ul NeT 
z 
; 0 
= 2K. aT eg acil КЕР ‚- (A) 
+ у 2 AN 
and li I е 2181 в«—(@—®)/(1—я1) {- (5 ze] X 2 sin ү dz. wee (АЛА) 
1 “FL 
Now, Е £ how eh { } s. (4.15) 
and i 1; = Tafla ... (4.1.6) 
where In кет 
= —2/B Бру: РЕ 
K.e { ГЛ гк} т 
and Е атаа) 
3 22 is {= EA Je 
x [e Hoa sin-l NE dz. .. АМ 
Аз g [4710-50 sinl у} - m ant fs |...) 


The: ке cin є—(®—)/(1—) 3: —— } x [ час! /2(- : 52 ue 
2 > i m "x 
From (А.1.1), (4.1.9), ..: (4.1.8) it follows. 


+ d? 1 
ua Р) = = ке. ox Ios. 


i d2 х 4 
Ar I5 is negative, ua F(z) is negative, so that F(z) is an upward convex function of z. 
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Appendix A.2 
Let " я : 
вв) = f Теа ayt af е9 iu ay } dedz — 2 (A24) 
оо à { 
where T = (Вал В) А’, A’ = Ам. ВВ = 1 and fi, Ва > 0, 


dk F(Q1, B2) = Ky f fe ml Bal A!) аз + Ba/ A) at 23 =f, {Bimi +Bawa} 271 (23 — из) ааа 
оо 


. (А.2.9) 


=K; f Ts о, шы {слил +022} vars хиа)! иа)! }4айиз 
0 


. (A.2.8) 
where а = В.А"; аз = В2/А*; 
с, = Bi/(1-+a1) ; ез = Ba/(I+a2). 
56 -сазв 1: For у = 2, from (А.2.3.) 
qe PiBu в) = 1—1, ve (Аза) 
dfi 
LT ; 
where h= (12) 7! I ] аташ тї чуйийи» 
° 


eo o 
Ij (на) ГГ, 0 ut изии; 
оо 


From (А.2.4) Ть/Т = (V a3)/(14-03) 5 (A* 3-B1)/( А’ B2) 


which is less than unity if В, Ва, so that F(Bı, Bz) increases as f, increases (Bi«CB»). It can S 
similarly shown that if B1>ßz F(1, B2) decreases as Bı increases and the function F(Bı, Вз) has а maxi- 
mum value at В: = 8; = 1/2. 


Sub-case 2: Еогу=1, from (A.2.3) for 81, B2>e>0, 


dg, Pu В) = А-а .„ (4.2.6) 
where ВК: (a)! f ] 67579 u 73 us 73 (она Бома) ибии 
0 0 
end а-к парф еее TE us 7 (oye Heute) E чад. 
оо 


Defining variates V,—u; and Vo=ty/u1, it can be shown that 


п-к, Па) Р Va? (cea) E (14 Va)? dvi. je (49) 
0 


For B, < s defining Z—1/(1--Vi), it can be shown that = у 
I = Е(1+ал) 101/8, 3/35 2; м) .. (А,2,7) 
where 21=4/(4'-81)- (Ва — B1)/82- > 
Also, for Bı < Bzit can be shown that i 
Ia = Kg(1+02)4F A/a, 12; 25} ^ + 3:8) 
From (A.2.7) and (А.2.8) it thus follows that for By < Ba 
II = (Ав) (A'-82). Раз 2; Аа) Р (1/2, 3/2: 25 Эл) 
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For В! < В thus 7(B1,85) increases as В: increases. It can also be similarly shown that for В: > Be, 
(81,82) decreases as Вл increases and the function has a maximum value at В: = By = 1/2. 


Sub-case 3: арта from (4.2.8) 2 


„ 


Фк, Вз) =П—1; ... (A.2.10) 


dj: 
0199 _ 5 * 
where iy = Ка) 1] f КЕЕ u È u È х (ош оа) "M ийан... (АЗ) 
- оо ә iN 
Sm wwe ч 
and | = К(1+а) “| (fs 8 Вы, у (ош зиз)” ийизи. ... (А.2.12) 
оо 


For 8—0, from (А.2.11) ава (А.2.12) | 
Tı =K. г(3/2) г(у/2— (10а), 
I, =Кз. г($) r(/2--1—3)/ +a). 


so that 11, — (14-21) (1+ аз)-1 (v—1) = (47481) (4^4-85)71 (9—1). 2. (A.2,18) 
From (A.2.13), I, would be greater than J; if 
Ау = A > v[(v—2). ‚.. (A.2.14) 
Now for 0 < Bı < В», defining variates Vz = uz and У; = 3/43, it: сап be shown from (A.2.11) that 
I = Ky (140, f vil ani? (1-7) -@+2) ay, ш. (АЗ). 
where 2 D; = ejfe; and р = v/2—1, 
From (A.2.15) it сап be shown that ў ý 
Ty = Ks (1+a;)~! D TÈ F(y841, $; 2; 4) 2. (A10) 
where — s № = 1—D; = 1—1 827—1 (44-83)/(A^- 81). 


It can К be. Во from (А.2.12) that for 0 < 81 < Bz 
Tz = Ks (1403) p,t FUEL 3/25 2; da). 2. (Азат) 
Erom, (SERI and (A.2.17) we thus have 
md Ты: = Di (А’ Вл) (A'--82)73. F(v[2--1, 3/2 5 2 да} /F(v]a--1, 12; 2; X2) 
= (= м) (A^ B1) (Ава) Fill, эһ; 2; VF (ls +1, 1/252;33) ... (А.2.48) 
Now according to algebraic relations due to Gauss (Erdelyi, 1953) satisfied by contiguous 
hypergeometric functions, 


F(a, b+i1 032) sa F(a, bt1;ct1;2), 
) 0-9 F(a, 65052) =1+2 О ve (A.2.19) 


From (A.2.18) and (4.2.19) Iz would be greater than J, if 


T E (4-8) (4^ 83) T} {14298} >1 > = s. (A30) 
where By = (ac) 67! Ж(аЬ+1; 641; А) (ар; c; ha) o 

E : а = v/2+1; =}; c=2; í 
and da = 1-8. 87! (4в.) КАВ). 7 


From (A.2.20), Ip would be greater than Jy if — = 


е САВ) (A'B) T} (LEE) > 148; В. E, 
or, if | . . 4'(B3—B1)E > ВВ. В), 
or, if 3052, НИ M ум S в < E um = - ve (A221) 
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Now м and fs are connected as 
As = 1—81. 8271 (4^41-03)(A^--B:)71— A" 8:71 (B2—B1)(A’+B1)-1 so that for 17832 єр b 
№ > A'(A'-(1—9)) 7l (295—171. . (A222) 
For clarity of exposition (A,2,21) would be а under two heads : 


Sub-case 1: Вз lies in the range 3/4 > В > E 


Sub-case 2: Вз lies in the range 1 > 83 > 3/4. 
For sub-case 1, from (A.2.21) it follows that since F(a,b +1; c+1; 2)/F(a, b ; c ; Уз) is greater than or equal to 
unity, (A.2.21) would be satisfied if 
A'—AJw > c (a—c)-1. 3/4 = 3/(v—2). sc (A333) , 
For sub-case 2, since Хз from (А.2.22) would be greater than or equal to A'( A^ 4- 1/471. 2/3, (А.2.21) would 
be satisfied if. 


‚ [1490 +1)0/(+1) лула ... (4.2.24) 
де | > адаа) ооо 
where 3o = A(A'4-1/4) - | 2/8. 
From (А.2.24) it follows that Iz would be grester that I1 if $ 
‚4'+14+а4']3 S A 9). ... (A228, 
apapa > 009 f 
Considering (А.2.25) the following auxiliary function U may be considered : 
U = A'(A'4-1/4)--a4'|3) —4(—2) 3 (A^ 1/4--a4^0). А450) 
In (А.2.26) а: K|(v—2) for A’ we get 
0-2) = (A’ 41/4) (K—4)+a4'(2K—4)/6 
= A'(K—4- (v--2)(K — 2)/0) + (K—4)/4 
or, 12—20 = K(19(K—4)--2(v--2) (K —2)) +3(K—4) (v—2) 
= K3(2»4-16)— K(v 4-62) —12(v— 2). ec (A227) 


Since the co-efficient of K2 of the quadratic on the R H8 of (A.2.27) is positive, for some value of K > Ko 
numerical value of the quadratic and as such numerical value of U is positive. Let the roota of the qua- 


dratic 
(2v-+-16)K2—K(v-+62)—12(v—2) = 0 ane (А.2.28) 
be Кү and Ky (where Ks > Ку). Now it can be shown that for v > 3, 
к, < 689-4048) _ ПУ, MEO 
А 4v-r82 4v 32 


Since the expression on the RHS of (A.2.29) for v > 3 is less than 16/5, it follows that U would be 


positive for К > 16/5, which means that (А.2.25) or (А.2.24) would be satisfied for 
A'(—AJv) > 3.2/\v—2) 
or, A> 3.2v/(v—2). 


(А.2.30) 


From (A.2.14), (А.2.23) and (A.2.25) it thus follows that for 0 < Bi<B2, Ia would be greater than qT, if 
A > 8.2v/(v—2). (А.2.31) 


if A is greater than or equal to 3,2v/(v— 2). 
B1 increases if 4 is numerically greater 
= B, = 1/2 and maximum value 


The function F'(B, B2) thus decreases as B1 increases for < 3 
It can also be similarly shown that for B1 > 82, (Ваз Ba) increases as 
than or equal to 3.2v/(v-2) and the function has a minimum value at Br 


at Bı =0 and B, —1. 
375. . 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


Since critical values of F-table for 1 and v(v > 3) d.f. for 5 p.c., 2 p.c., 1 р.с. ete. level of significance 
are all greater than 3.2v/(v— 2) [a relatio which can be proved using the algebraic relation due to Fisher 
(1941,.page 151 middle)] the relation : 


Вуха, В» xta > 4074 
would be satisfied with probability less than or equal to « for x —0.05, 0.02, 0.01, ete. and v > 3,ifAis 


taken from F-table corresponding to 1 and v d.f. for given a. 
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А SEQUENTIAL TEST OF FIT FOR MULTIVARIATE 
DISTRIBUTIONS* 


LIONEL WEISS - 
Cornell. University 


being tested. The event Bn is defined as occurring if X», falls in one of the n spheres. The events 
Bı, Ba, ... are treated formally as though they are independent events with a common unknown probability 
р, and the Wald sequential test of the hypothesis that p is equal to ро against the alternative that p is 


1. DESCRIPTION OF THE. TEST 


X4, Xs, ... are independent, identically distributed L-dimensional random 
variables, with an unknown common probability density function. The hypothesis 
to be tested is that the unknown probability density function is g(x), a completely speci- 
fied density function. Here z denotes a k-dimensional vector, as it will throughout 
this paper. Let C denote the closed k-dimensional unit cube. It will be convenient 
to apply the transformation described by Rosenblatt (1952) to each of the variables 
Xy Xs, ..., во that when the hypothesis is true, each transformed variable has the 
uniform distribution over б. This transformation also guarantees that each trans- 
formed variable will be in C whether the hypothesis is true or not. From now on we 
assume that this transformation has been applied to each of the variables PONS. Cea a 
so that all probability density functions considered assign probability one to О, and we 
are testing the hypothesis that the common probability density function of X,, OPER 
is uniform over C. "The pre-assigned level of significance will be denoted by х. 

X, Х,,... are observed sequentially. After X,, ..., X, have been observed, 
we define a subset S,(t) of С as follows, for t positive. A point x of C is in S,(t) if 
and only if the closed k-dimensional sphere with center at x and with volume t/n 
contains at least one of the points X,, ..., X,. ; 

We choose values py, ру, and f, where 0 < p, < p, < 1 and 0 < $ < 1—a, 
and hold them fixed. We define T(n) as the value of ¢ for which the volume of the set 
S,(t) is exactly equal to po. Clearly, T(n) is uniquely defined. We define the event 
B, as occurring when and only when X,,, is in S,(T(n)). 

The test of fit is carried out by acting as though B,, B,, ... are independent 
events, each with the same unknown probability p, and using the Wald sequential test 
of the hypothesis that p = p, against the alternative that р = p, with level of 


*Research supported by National Science Foundation, Grant No. NSF—G11321. 
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significance a and power 1—/. То be more specific, for each positive integer m let Dm 
denote the number of events Ву, В», ..., Bm which occur and define (Wald, 1947, p. 92) 


MB log [АКА —2] pm- м-р) _ 
^ . leg (р/а) —108 [U—p) 1291 log (pi/po)— log [(1—21)/0 —29)] 
log аа ^ ,, юар) 


"= ов (раро) — log И-П] 1о& (Pala) — log [(0.—p)/0—291 ` 
Sampling continues as long аз а < Dm < r,. The first time that these inequalities 
do not hold, we accept the hypothesis of uniform distribution if Dm < am, and reject 
the hypothesis if Dm > Tm 

This test of fit is an extension of a sequential test for the case k = 1 which 
has been discussed by the author (Weiss, 1961). In Section 3 we discuss the 
properties of the test, and in the next section we develop a basic theorem which 
will be used to develop the properties of the test. 


2. А BASIO THEOREM 


In this section we state and prove a theorem that will be basic for our 
investigation of the properties of ‘the test. а 

A probability density function f(x) will be said to have the property А if 
f(x) assigns probability 1 to C, is bounded over С, and if for each point 2 in the interior 
of О the probability assigned by f(x) to a sphere centered at x and of volume v and entirely 
contained in О can be written as f(x)v+-R(x, v)u**U*, where | (2, v)| < Кү for all 
ж in О and for all < К,, where K, and K, are some finite constants. We note that 
the uniform density function has the property В, as does any bounded density function 
with bounded continuous first partial derivatives in the interior of C. The symbols 
dx and dy will be understood to denote dz, ... dz; and dy, ... dy, respectively. АП 
regions of integration considered will be subsets of C ; in particular, if an indicated 
region of integration is not a subset of C, the actual region of integration is to be 
understood to be the intersection of the indicated region with О. For given functions 
h(x), f(x), and a given positive value & the integral I h(x) exp [—éf(«)]dx will be 
denoted by A(t; f, h). 

The following theorem is basio. 

Theorem: If the common probability density function of X4, Xs, ... їз /(®), 
where f(x) has the property R, and h(x) is any bounded probability density function over 
€, then mu A(x)de converges to 1—A(t; f, h) with probability 1 as m increases, for 
1 т 


ату given positive t. 
d; Proof: Throughout the proof, the quantities ,,0,,... will denote finite 
positive constants, whose actual values will not have to be specified. 


Fix a positive value of t. Break О into two mutually exclusive parts: C, 
consists of those a of C whose distance to the nearest boundary of C is greater 


T(4k+1) t 5 
than [ELS G Em ) > E Ол consists of all other points of C. We note that if v is any 
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point in Оу, the sphere of volume i/n centered at x is entirely contained in C. The 
m 2n 1 bye e: 

volume of €; is 1— | 1— COEM i) | ‚ which is less than 6,(t/n)"£, 

Denote by F(a, t/n) the probability assigned by f(x) to a sphere centered at 
x and of volume t/n. Since f(x) has the property R, we have F(a, tn) = f(x)t/n+- 
R(x, t/n)(t/n)™* for each ж in C, where | R(x, t/n)| < 0, for all x т C; and all sufficiently 
large m. 

For each æ in C, we define Г, (а) to be equal to zero if at least one of X,, ..., X, 
falls in the sphere of volume t/n centered at x, and define I,(w) to be equal to one if 
none of Xj, ..., X, falls in the sphere of volume ż/n centered at x. Then 


кер Ме = l hæ) (а). 5 (ал) 
From Kolmogoroff [(1946), pp. 39—41 (see also Robbins (1944)], 
we have | М«)1„(®)й} == J h(a) B{I,(x)}da 
= 1 A(x)[1—F (x, t/n)]" dx. 25 (2,2) 


We can write 


J Д v)[1 1 (x, t[n)] dz = J Ма) —f(x)t/n—R(x, t/n)(t/n)* 1] da: 
Ò , 
] та) F (x, t[n)] dz. ... (2.3) 


Clearly, since A(x) is bounded, the Mond integral on the right side of (2.3) is 
non-negative and less than 6,(/n)/*. We investigate the first integral on the 
right side of equation (2.3). We have 
log [1—f(x)t/n—R(a, тен = n log [1—f(2)t/n— Rv, үп) т) 7] 
= —tf(x)+R,(x, т)? (24) 
where |R,(x, n)| < 0; for all = in С, and all sufficiently large n. Therefore, the first 
integral on the right side of equation (2.3) can be written as 
J Ма) exp [—tf (£) + Ву, n) Mz 
6, 
= ГИП - Ryle, при] ехр.[—1/(®)]й® e (2.5) 
On 

where | R,(x, n)| < 6, for all x in С, and all sufficiently large n. From (2.5) it follows 
that the first integral on the right side of equation (2.3) can be written as 

J h(a) exp [—tf(w)]dx+ Вупи” © (2.6) 

2, 


п 
where |R,(n)| < 6; for all sufficiently large n. Also, because of the bound on the 
volume of O% given above, the integral in (2.6) differs from A(t; f, h) by less than 
Os[nV*, Then we have from (2.1)-(2.6) к 
m eed = A(t; f, №- Вит) пи ir (DHT) 
(2 / 


where | R,(n)| < 0, for all sufficiently large n. 
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Next we investigate 


е At У, №] j .. (2.8) 
which can be written as 
BEL Мат ode) 240, В) BE р W(x) Т.) йа АЕ; fh)... (29) 
The first term of (2.9) can be written as 
BES [ Ма) МУД) dy da). ^s (2:10) 


For our investigation of (2.10) we introduce the following notation. F(x, у, п) 
denotes the probability assigned by f(x) to the set-theoretic sum of the spheres 
of volume ¢/n centered at x, y respectively. Then 

EXL,(2)I,(y)} = [1— (2, y, п)". E. (2.11) 
For a given point x in Cp, we denote by O,(x) the set of all points у which are in C, 
х Г(3Е-+-1) #70 ES 
and which are farther from x than 2 [ Ties E . We note that if v is in О, 
and y is in C,(z), the spheres of volume //n centered at =, y respectively are disjoint 

and are both contained in C, and therefore 
1—f(x)t[n— f(y)t[n— (х,у {пук Т" 
[1—Ё(т, у, т)" = [| | à (2.12) 

— R(y, t/n)(t/n) +1 

Denote by C;(z) the set of all points y which are in C but not in C,(r) ‘The volume 
of O;(w) is no greater than the sum of the volume of С} and the volume of a sphere 


of radius ч xL апа? іб follows easily that the volume of O;(x) is less 
than 6,,[nVE, 
We can write (2.10) as 
КГ =) МУП — Ро, y, трах 


Cy C, (2) 

ES CI Me) МУ -— Ре, y, тута 
т б 

+] U Me) Му) — Ра, y, ї{п)'у}й® ro 9 
с, с 


and from (2.12) and the known bounds оп A(x) and the volumes of C; and C;(2), 
(2.13) can be written as 


pop A(x) hy) 1—{/@+/%} 


Си. C, ,[dy dx 
1 t (үн 
zi R(x, x) -R (y, 2) i) 
HQ) n 
where |R;(»)| < Ө, for all sufficiently large n. Just as in the discussion following 
(2.4), we find that the integral appearing in (2.14) can be written as 
Ji h(x) Му) exp[—tf(x)—t f(y)] dy dx+-R,(n)/n¥* s. (2.15) 


(2.14) 


[AS 
where |R,(n)| < 0 for all sufficiently large п. From the bounds on the volumes 
of O% and O°, (x), the integral appearing in (2.15) can be written as 

J [| Ме) Му) expl—t fe) —t fy)]dy de+R{n)/n¥* 


= AXE; f, h)+Ry(n) jnt ep (216) 
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where |R,(n)| < 6s for all sufficiently largen. Using (2.13)—(2.16), we find that (2.10) 
can be written as 

A*(6 f, h) - В (п) пи ... (2.17) 
where |R,(n)| < Ө for all sufficiently large и. Applying (2.17) and (2.7) to (2.9), 
we find that (2.8) can be written as 

Вут) ти s (248) 

where |E,(n)| < 6,5 for all sufficiently large п. Using (2.18) and Tchebysheff’s 
inequality, we find that for amy positive e, 


uA 6, 
P[ ЈА he)dz—D—A(5f, >] < кр. (19) 


For a given positive є and positive integer m, let j(m; є) denote the event 
Lf h(x)da—[1—A(t; f, h)]| < e, and let Лт; є) denote the simultaneous occurrence 


Sm(t) 

of all the events j(m; є), j(m+1; €),.... Let K(m; є) denote the simultaneous 
occurrence of all the events j(m**;.€), j((m-+-1)*; e), .... From (2.19). and elementary 
considerations, we have f i 


P[K(m; єў] > {<> | ... (2.20) 


From (2.20), it follows that given any positive values e, б, there is а finite positive 

integer M(e, ô) such that if m > M(e, ô), then P[K(m; €)] > 1—0. i 
Let 0,3 be an upper bound on A(z). If u, v are positive integers with u < v, 
J Wx) dx < f Ма) dx+O(v—w)t/v (2.21) 

Solt) Sult) r 

since the greatest volume each of the points X,,5, ..., X, could contribute to S,(t) 
ist/v. Choose a positive integer w, and let n be an integer satisfying w% < n < (w--1)**. 
Using (2.21), we find 


then 


| общепит hal (2:22) 
бшу) — 
< 1м) < f Ма) 4-0,1 — 0? 1/т. 
Sn(t) MPO 


4 Ий ү E 
Since [(w--1)9—m)t[(w-4-1)* is no greater than t а= ( PI ) Jana [n—w**] t/n is 
no greater than ¢ f en m | it follows from (2.22) that given any є, there is 


a finite positive integer L(e) such that if m > 14е), the occurrence of K(m; e/2) implies 
the occurrence of J(m?*; є). Define №, д) as the larger of the integers Z(s), M(e/2, ô). 
If m > N(e, ô), then P[J(m3*; є)] > 1—8. This completes the proof of the theorem. 

Before applying the theorem to the sequential test of fit, we note that the 
theorem ean be proved under less restrictive conditions, although the proof becomes 
more complicated. For example, if © can be broken into a finite number of regions, 
in each separate region f(x) has the property R, and the set of all the boundary points 
of the regions has measure zero, the theorem holds. ines 

It is easily seen that the convergence oy J i а) dx to 1—A(t; f, h) is uniform 


in t, 
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3. PROPERTIES OF THE SEQUENTIAL TEST 


First we discuss the properties of the test when the hypothesis is true, that 
is, when the common distribution of X X., ... is uniform over C. In this case, 
B, B, ... actually are independent events, with common probability p, since 
“Р(В„) = P[X,,, in S,(T(n)] = volume of S,(T(n)) = ру, and therefore the properties 
of the test of fit are exactly the same as the properties of the test on p when p — py: 
that is, the level of significance is approximately c, and the expected sample size is, 
from Wald [(1947), p. 100], approximately 


(1—«) log [2/(1—)]+-о log ((1—5)/2] 
Po log (ру/рь)--(1— у) log [(1—р,)/(1—)]` 
(The word "approximately" is used because Wald obtained the properties of his 
sequential test only approximately, by ignoring the excess of the probability ratio over 
the decision boundaries. The formula for the expected sample size should really be 
increased by one, since the event B, is defined using X,,,,, but this is a minor refinement). 


Next we investigate the properties of the sequential test of fit when the 
hypothesis is not true. Throughout this part of the discussion, we assume that the 
common probability density function of X,, X,, ... is f(x), where f(x) has the property 
Е. Define i(f) as the solution in tof the equation 


1— f өхр[—4/(@) = py ev OG) 
and denote the quantity 1— l f(x) exp [—t(f) (а) 12208,9) 


by Q(f) We will show that P(B,| Xi, ..., Xn) converges to Q(f) with probability 

1 as п increases. To show this, we set h(x) = 1 in the theorem of Section 2, and 

note that then the theorem states that f 1 dx = volume -of S,(t) converges to 
Sn(t) 


"rf exp [-# f(x)]dz with probability 1 as т increases. By the definitions of T(n) 


and (f), this implies that T(n) converges to (f) with probability one as n increases. 

Next we set h(x) = f(x) in the theorem of Section 2, and note that then the theorem 

states that P[X,,, in S,(¢)] = J i f(x)dx converges to 1— f f(x) exp [—t/(z)]d« with 
n(t с 


probability/1 as n increases. Since P(B,| X,, ..., X,) = PIX, 41 № S,(T(»))], and we 
have seen that T(n) converges to Wf) with probability 1 as n increases, it follows 
that Р(В, | X,, ..., X,) converges to 1— f f(x) exp [—tf(x)]dz = Q(f) with probability 
1 as m increases, E 


А Now we can compute the approximate power of the sequential test of fit 
ш, f(x), if æ and f are both small. For when « and В are both small, the test 
will surely take many observations before coming to a decision. Even large differences 
between Р(В„| X,, ..., X,) and Q(f) for a relatively few small values of n cannot have 
much effect on the power of the test, if many observations are taken. Since the 
properties of the Wald sequential test of a binomial parameter p vary continuously 
with p, it follows that when о and В are small, and the common density function of 
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X, Xs, ... is f(x), the power of the sequential test of fit is approximately the same as 
the power of the Wald test of р when p is actually equal to Q(f). That is, from 
p. 96 of Wald (1947), if the quantity h is defined by the equation E 


о) = 1—[(1—p,)/(1—po)]" 
(p1/Po)'—[1—py)/(1—po) I" 
then the probability that the hypothesis is rejected when the common density function 
is f(x) is approximately 


1—[(1—)]' 
[(1—/)/]*—[8](1—)]'` 
We note that a necessary condition for а reasonable power funotion for the 
sequential test is that Q(f) > ро for each density function f(x) over С which differs from 
1 on a subset of С of positive Lebesgue measure. It is clear that this inequality will 
be proved if we show that for each positive value of t, 


1— { Ja) exp [—tf(z)] dx > 1— J exp [—t f(«)] dx 
or equivalently, that J [1—f(2)] exp [—tf()] dx > 0 ... (3.8) 
0 


with equality if and only if f(x) = 1 almost everywhere оп C. The integral in (3.3) 
can be written as 


Г U-f@)] exp [—t Даме f. [1—/@)]ехр[—!/@]4. .. (8.4) 
w:flæ)<1 æ:f(%)>1 
The first integral in (3.4) is greater than or equal to Ы ] e [1—/(x)]e"* dx, with 
ais (2; 
equality if and only if the set {x : х in C and f(x) < 1} has Lebesgue measure zero; 
the second integral in (3.4) is greater than or equal to f 2 [1—f(x)Je do, with 
2: 


uo» 
equality if and only if the set {x : vin C and f(x) > 1) has Lebesgue measure zero. 
Therefore the integral in (3.3) is greater than or equal to 
7 = БУ — cda 0 
E Е ju /\(®)]е' de Lu 2. [1—/(®)]е 
with equality if and only if f(x) = 1 almost everywhere on C. This proves the desired 
inequality. 

Another characteristic of the sequential test of interest is the expected sample 
size when the hypothesis of a uniform distribution is not true. It would seem а reason- 
able conjecture that when the common density function of X, X, -is f(x), the 
expected sample size is approximately the expected sample size of the Wald test of 
a binomial parameter p when p — Q(f). However, the sample size is an unbounded 
function over the sample sequences, and therefore even though the probability of 
large deviations between Р(В„| Х,,..., X,) and Q(f) are small, such deviations could 
still have a large disturbing effect on the expected sample size. · Моге research must 
be done on this. . 
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4, COMPARISON OF THE SEQUENTIAL TEST WITH OTHER TESTS OF FIT 


There are only a few test of fit in the multivariate case which are generally 
known. The most familiar is the chi-square test which involves breaking C into sub- 
regions, and using the numbers of observations falling in the various subregions, 
Somewhat less familiar are the Kolmogoroff-Smirnov test and the von Mises test, 
both described by Rosenblatt (1952). 

- The different tests can be compared according to various criteria. For example, 
the purely mechanical problem of computing the test statistic can be formidable. 
Tn this respect, the chi-square test seems the most convenient, while the sequential 
test seems the least convenient. 

| Another criterion for comparing the tests is the amount of knowledge avail- 
able to us about their power functions. We seem to know more about the power 
functions of the chi-square test and the sequential test than about the power fune- 
tions of the two other tests. Also, our knowledge about the power function of the 
sequential test is in a very convenient form, in the following sense. Suppose we want 
a test of fit of level of significance x and power 1—/ against a particular given alter- 
native f(x), where о and £ are small. ‘Then we simply set р, = Q(f) in the specification 
of the sequential test to achieve this. Note that this is a reasonable way to choose 
a value for ру, which up to now has been an arbitrary value in the open interval (po, 1). 
This still leaves us the choice of the value of Po. The lower the value chosen for Po, 
the smaller is the expected sample size. This might seem desirable, but there is a 
"danger here : if the expected sample size is made too small, the test will probably not 
continue long enough for the theorem of Section 2 to come into play, and we will lose 
our knowledge about the power of the test. 

By far the most important criterion for the comparison of tests is the shape of 
their power functions. It seems unlikely that any one of the four tests mentioned 
will have a uniformly higher power function than one of the other tests. For example, 
it is easy to find alternatives against which the power of the chi-square test is low, 
by assigning to the various subregions into which О is broken the same probabilities 
under the alternative as under the hypothesis, while within the subregions the alterna- 
tive and hypothesis can differ greatly. 

Note: The test seems to be symmetric in the Ё components of X, in the sense that for a given 
sampls sequence, consistently interchanging the roles of the coordinates of X will not affect the outcome of 

Etha test. This is so because the construction uses the Euclidean distance X(yj—2z; )? between two points 
(ут, +++ › Ye) and (2, ... , г), and this distance is unaffected by interchanging, for example, y; with уз and 
#1, with 2,. Tho test is not invariant under permutations of the observed vectors X1, Хо, ...., but this is а 
familiar property of sequential tests, 
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THE DISTRIBUTION OF THE RATIO OF THE VARIANCES 
OF VARIATE DIFFERENCES IN THE CIRCULAR CASE* 


By J. N. K. RAO and G. TINTNER 
Тоша State University 


SUMMARY. In time series analysis, the variate difference method is used to test the order of 
the finite difference at which the trend or the systematic part in the time series is approximately 
climinated. There is no exact test available in the literature exoept for the one proposed by Tintner 
based on a method of selection which uses only a portion of the observations. In this paper, the 
statistic Viys/Vk is proposed to test that the trend is approximately eliminated at the k-th 
finite differencing of the series, where Ve is the variance of the series of the k-th differences. 
Its exact distribution assuming that the observations are NI (0, в?) is derived under a circular definition 
of the universe. The lower 5% and 1% points of the statistics V2/V1 and V3/V2 are tabulated for 
various values of №, the size of the sample. In practice, one uses the non-circular statistic with these 
percentage points for the circular statistic as an approximation, especially with long time series. 


1. IwTRODUCTION 


Statistics based on successive differences have been widely used for testing 
the independence of successive observations in the analysis of economic time series, 
ete. [e.g., Anderson (1929) and Tintner (1940)]. von Neumann (1942) proposed the 
statistic 02/82 ог testing the independence of successive observations where д? is the mean 
square successive first difference and s? is the-variance of the observations. He ob- 
tained the distribution of 02/82 assuming that the observations are not auto-correlated. 
Percentage points of the distribution of this statistic have been obtained by Hart 
(1942). One disadvantageous feature of 63/5? is this: If there is a slow moving 
trend in the mean value, due to the appearance of the mean in sê, it will be heavily 
biased. his difficulty could be avoided if the statistic V,/V, is used, where V, is the 
variance of the series of the second differences and V, is the variance of the series of 
the first differences of the successive observations, since V, and V, are independent of 
the mean value and the bias in V, and V, is much smaller than the bias in 8° if there 
is a trend in the mean (Kamat, 1954). Dixon (1944) derived the first two moments 
of V,/V, approximately by smoothing the joint characteristic function of V, and V; 
using the method of Koopmans (1942), assuming a circular definition of the universe. 

In time series analysis, the variate difference method is used to test the order 
of the finite difference at which the trend or the systematic part in the time series 
is approximately eliminated. There is no exact test available in the literature, except 
for the one proposed by Tintner (1940) based on a method of selection which uses only 
а portion of the observations. The statistic Vi,|Vi where Vi and Vi, are the 
variances of the k-th and (k-+1)-th differences respectively, based upon selected ob- 
servations can be used to test the above hypothesis, In this paper we derive the exact 
distribution of the ratio V;,,/V, under the circular definition assuming that the obser- 
vations are not autocorrelated. "The lower 5% and 1% points are tabulated for the. 
statistie V,/V, and are compared with those obtained by a normal approximation 

*One of the authors (J. №. К. Rao) wishes to тако acknowledgment to the support from National 
Science Foundation, Grant No. 14236. Journal Paper No. J-4093 of the Iowa Agricultural and Homo 
Economics Experiment Station, Ames, Iowa, Project No. 1200. 
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using Dixon’s (1944) first two moments of V,/V,. In practice, one can use the non- 
circular statistic with these percentage points for the circular statistic as an approxi- 
mation, especially with long time series. Also, the lower 5% and 19/ points of the 
statistic V,/V, are tabulated which may be useful in employing the variate difference 
method. 


2. THE JOINT CHARACTERISTIC FUNCTION OF Vp; AND V, 


Let № successive observations in the sample be z,,... zy and with the 
eircular definition 
Хун = уы y ves (2.1) 


The variance of the k-th difference is defined аз 


N 
У (Аё)? 
1 
2k 
(1) 
where Д? x, is the k-th difference of the x, То find the joint characteristic function 
of V, and Vz, we use the method of Tintner (1955) in deriving the characteristic 
function of V,. Under the null hypothesis, the observations 2, ..., zy are assumed 
to be NI(0, 0). Let P(t, ta) denote the joint characteristic function of V;,, and V, 
"Then 


(2.2) 


1 
P(t, te) = maga J J 


e. ё у 
Геи) о ааа, dy... des ... (2.3) 
-0 


where i = \/—1. Expanding X(A*z)? and X(A* aj? we find 


И Tl (0) 0122) ®ан+®ааы+.. 
k 
*CaX (3) Essa zema] ... (24) 
СРЕ 
k+1 


0-10 (PF?) esa, c iraa) 


А о o @® 
Using equations (2.4) and (2.5), Ф(&, f,) can be written as 
1 9 х= ра о? 
Ht) = (осунун Г, [offe Dee de, .. ху 2. (2.0) 
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he column vector of the zp and D is a matrix whose determinant |Р | is 
Evaluation of |D] gives 


|D] SE [dy tayo... 4-е) азо... Рандо. ws (2.7) 
is the N-th root of unity, . 


Ы a con 205 гіна 20. 2. (28) 


(2.9) 


eee ment 


, 24,03 22 (sin 79) i: ... (2.10) 


s oy =) | : e Qn 
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From (2.11) one can easily derive the following variances and correlation coefficients : 
4k--4 
204 (213) 
epi = var (Via) = аа 
( k4-1 ) 


zm 
pots NX 1 
{| [eds 8k--10k--3 
2k/ V2k--2 

It is interesting to note from (2.13) that 


and pi = corr? ( Vg Vy) = 


20 9 
a S51 m. (2.14) 


and that pẹ increases monotonically with К. 


Now we show that V,,, and V, have а joint bivariate normal distribution 
as N— oo. Consider the joint cumulant generating function of the standardized 
variates 


—(? —g? 
Vii = Vao? and РЁ} = 7—0 ... (2.15) 
a ту. 


which can be obtained from (2.11) as 


Mt, te) = log $t, t) = — tio? it? 


[2 751 Tk 
Ir 2k42 . 2k 
1 Xi : 22k+2g2 2 29 Е 9202 п) 
X. 1—2 LN EN SECS et a М 
ri oa | PE ARIS (m) 2i у ГЭЕ (s F) 
ya k I ) k k 


(2.16) 


where $!(4, t) is the joint characteristic function of the standardized variates V}, 
and Vi. Now expanding the logarithm in (2.16), using the relations 


N-1 j \ arm 2i 
pum ain 22:7 ут «e (24) 
EET ur) = m) ; 
and making N— со, after some simplification we find that 

QN, ty) ~ g Mc + 2Patits) ... (2.18) 


which is the joint characteristic function of two variates with a bivariate normal 
distribution with zero: means, unit variances and correlation coefficient p, Hence, 
as IN —oo, Vj; and V, have a bivariate normal distribution with the same mean 0”, 
variances of, and of and correlation coefficient рь. 
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3. THE EXACT DISTRIBUTION OF THE RATIO Vy44/V; 

In order to find the exact distribution of the ratio Vial 7, from the joint 
characteristic function ¢(f, 6), we use the inversion theorem for the ratio (Gurland, 
1948). Now the cumulative density function of Vi,,/V, is — 

(Pte d Vua xm Р 
Ө.А) = Pr (38 « a) —PnV,4—AV; X 0) ... (81) 
E : 
and from Gurland (1948) it follows that 


8,0) = 1- раш й a (33) 


where { denotes the Cauchy principal value and ¢(t, —tA) is obtained by substituting 


t, = t and f ——1AÀ in (2.11) for (f, tẹ). Consider now the two cases JV odd and N 
even. When N is odd, from (2.11) it is easily seen that 


(v-1)/2 : 5 
66-0) = П ü—ei i= PCR 
=1 
d (W=2)/2 : 
and when N iseven ¢(t, —tA) = (1—c i t) E (1—a, i t) 2. (8.4) 
bins E 
gie (sin 9) 25 (аш) 
where а; = 20? - 207A. ... (3.5) 
9, и) (77) 
k+i Ё 
D 92k 
and с = 203 = a 329 a (3.6) 
м я này) 


Case 1: N odd. Now (3.3) can be put into partial fractions as 


-2 В, 
tA) Е х8 e. (3.71) 
9t. ) ja (1—ait) 
a(A-9)2 
where Bj =r (3.8) 
П (a,—a,) 
т 
"Therefore, from (3.2) it follows that 
es (142 _ А 
G)-i- ВР» .. (8:9) 
Ja 
i jac b ужеш. 3.10 
тех BOA = gs $ пат (3109) 
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By contour integration, the evaluation of (3.10) gives 


$ На>о0 
Е, (j, à) = © (3.11) 
—tifa<0 
` (хра 
Therefore G, (A) = 4-4 n д; B, 2 (3.12) 
А ; `1а,>0 
‚ where à; = Pee” (3.13) 
L—1 if a, € 0. 


When A < 0, aj > 0 for all j so that 

: G (A) = 4-4 E B,—0 es (3.14) 
since У В, = 1. Also for A sufficiently large, say 7’, а; < 0 for all j so that 

9, (17) = F3EB,—1 
and therefore cum а, (T) = @, (©) = 1. ... (3.15) 
This provides a verification that G,(A) given by (3.12) is in fact а cumulative density 
function. 
Case 2: N even. Now (3.4) can be put into partial fractions as 


(N=2)/2 A 


—ta) = c ia EE ЭЙ 3.16 
РЕ) 2 (1—ajit)(1— cit)! OM 
a(N- 9/2 ^ 
where A, = a 3 E (3.17) 
II (a,—4,) 
Tes $ 
Therefore, from (3.2) it follows that 
: (№—2)/2 28 
GAA) = 1— a А,Р.3, А) ... (3.18) 
z 1 dt 
where EU e E: (3:19 
| Fj, А) эў tai cin M 


Let I mi, ma (2) denote the distribution function of y? /y? , where xX andy? are 
x ту ms mai ma 


X* variables with m, and т. degrees of freedomTrespectively. Then we have from 
Gurland (1948) 


1 кора. de ш 20 
a a ЭЛҮ ЛС : SU ME 


To evaluate (3.19) we make use of (3.20) and distinguish the two cases a; > 0 and a; « 0. 
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2 case а; > 0. From (3.19) we have that 


а 


EV | 
Руј, А) = + Е (= ny. (3.21) 


i 


а; 


Faj, A) = 1. (77) (3.22) 
noted that c/a; does not contain the nuisance parameter c*. 
3 E а; < 0. In this case we have 
| 3: - dt’ с 
iT = =1—Ip, (——).... (3.23) 
= 2лї Pee Emu | = 
a; à 
8,0) = PI А, s (3.24) 
j=l 
m Ij: (22) На>0 
ğ = 4 (3.25) 
> к) if a; < 0. 
| = ; ONSE. 
evaluation of Jz, (—7-) gives 
Er i 
c 1 ВА ... (8.20) 
la rx) : | 1—(с/а,) | E s 
Е : с 
en À < 0,92 0 and c> 0 so that Isı Сер) =0 
6.0) =4-424;=0 (3.27) 
When A = T where 7 is sufficiently large such that а, < 0 andc < 0, 
(ym 0 so that 
а; 
| в (T) = 4E E 4,7! dye 
(3.29) 


в 9, (7) = 8, (0) = 1. 
тәэ : 


‚ a verification that @, (A) given by (3.24) is in fact a cumulative density 
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4. COMPUTATION OF PERCENTAGE POINTS FOR V/V, AND. V/V, 
The probabilities @,(A) for several values of A with an interval of 0.05 and for 
N = 6 to 25 are computed on IBM 650, Then using Bessel’s second-degree inverse 
interpolation formula (since third differences are small), the values of A corresponding 
to G(A).= 0.05 and @,(A) = 0.01 denoted by AQ) and AQ? respectively, are obtained 
and are given in columns (2) and (5) of Table 1 below. The approximation of Dixon 
(1944) based on the smoothing process results in the following moments of V,/V, : 


E(V,V,) = ad E uu 


and var (Vj/V,) ANDR 2. (43) 


TABLE 1. PERCENTAGE POINTS OF V,/V, AND 82/2s? 
1 01 


ELO ACE 20) : XQ) 

N exact approximate .05 өхас& approximate 01 3 
(1) (2) (3) (0. (5) (6) (7) 
6 0.593 0.652 0.534 0.434 0.527 

Г 0.654 0.678 0.546 0.485 0.561 

8 0.675 0.699 0.561 0.542 0.589 

9 0.688 0.717 0.576 0.569 0.618 

10 0.708 0.731 0.590 0.584 0.633 

11 0.723 0.744 0.603 0.606 0.650 

12 0.735 0.756 0.615 0.626 0.665 

13 0.747 0.765 0.626 0.640 0.678 

14 0.758 0.775 0.636 0.654 0.691 

15 0.766 0.782 0.646 0.667 0.701 

16 0.774 0.790 0.654 0.678 0.711 

17 0.782 0.796 0.663 0.688 0.719 

18 0.788 0.802 0.670 0.698 0.728 

19 0.795 0.808 0.677 0.706 0.735 
20 0.800 0.813 0.684 0.715 0.742 
21 0.806 0.818 0.690 0.722 0.749 D 
22 0.811 0.823 0.696 0.729 0.755 90€ 
23 0.815 0.827 0.702 0.735 0.761 514 
24 0.820 0.831 0.707 0.741 0.766 0.580 
25 0.823 0.833 0.712 0.747 0.770 0.587 
30 0.849 0.734 2 0.791 0.618 
35 0.861. 0.751 0.807 0.643 
40 0.870 0.765 0.820 0.663 
45 0.878 0.778 0.831 0.681 
50 0.884 0.788 0.839 0.695 


Using а normal approximation for the distribution of V,/V, with the mean and the 
variance given by (4.1) and (4.2) respectively, the values of A() and AQ) are obtained 
and are given in columns 3 and 6 of Table l. For comparison, the values of A 


2 
corresponding to P E < a) = 0.05 and P ( xm < a) = 0.01, denoted by AY? 


and А? respectively are also given in columns (4) and (7) of Table 1. These values 
are obtained from Hart’s (1942) table of percentage points for 02/82. From Table 1 
it is evident that Dixon's approximation is fairly accurate when X is greater than 
15, particularly for 5% values. Since the exact evaluation becomes cumbersome 
as N increases we give only values of ЛО) and AQ? obtained from Dixon’s approxima- 
tion for some selected values of № > 25. 
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Since the percentage points of V/V, may be quite useful in the variate dif- 
ference method, the values of A corresponding to G,(A) = 0.05 and G,(A) = 0.01 denoted 
by AG and AG} respectively are obtained using the same procedure employed for 
„Үү. The values of ЛӘ and A) are given in columns (2) and (4) of Table 2. 
TABLE 2. PERCENTAGE POINTS OF V;/V; 


@ х 
N exact ^ approximate exact . approximate 
(1) (2) (3) (4) (5) 
9 0.765 0.645 
10 0.779 0.679 
11 0:786 0.696 
12 0.800 0.708 
13 0.809 0.722 
14 0.816 0.734 
15 0.821 0.743 
16 0.828 0.752 
17 0.836 0.762 
18 0.841 0.770 
19 0.846 0.777 
20 0.851 0.863 0.783 0.813 
21 0.855 0.867 0.789 0.817 
22 0.859 0.870 0.794 0.822 
23 0.863 0.873 0.800 0.826 
24 0.867 0.876 0.806 0.830 
25 0.870 0.879 0.812 0.834 
30 0.890 0.849 
85 0.899 0.861 
40 0.906 0.871 
45 0.912 0.879 
50 0.917 0.885 


The evaluation of the first two moments of V,/V, through Dixon’s method of 
approximation seems to be complicated. "Therefore, we use the approximate formulas 
for the first two moments of the ratio of two random variables. 


Then to 0(N-?) we find 


Va уно Eid 
Е (2 )= Неро - (4.3) 
апа ` var = сї--о%—9руюуту]с% а. (44) 
V, - 
since А ЕТ.) = E(V3) = 0. s. (4.5) 
Now using (2.12) we obtain ; 
Lage os ... (4.6 
в ( nl i: (4.6) 
зү с 08 VALE 
and var ea )= 9005“ 3 (4.7) 


Using now a normal approximation for the distribution of V,/V, with the mean and 
the variance given by (4.6) and (4.7) respectively, the values of A and AQ] are 
obtained and are given in columns (3) and (5) of Table 2. Since the approximate 
formulas are satisfactory only for large N, we tabulated the values for N 2 20. 


'The normal approximation seems to be quite satisfactory. 
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5. EXAMPLE 


To illustrate the use of the percentage points given in Tables 1 and 2 in the 
_ application of the variate difference method, we take the example of the yearly series 
of the quantity of meat consumed in the United States, 1919-1941 (Tintner, 1952, 
р. 320). The noncircular definition of the variance of the k-th difference is 
N-k 
ү" een ( 5 1) 
кшк. б. (Б. 
708 


Tt is found that for the present data 
Vo = 62.2517 Vi = 28.5864 
Vg = 17.0411 V; = 15.7392. 
The нЕ now is to find the order of the finite differenoe Ё where the systematic 
part (or trend) of the time series is sufficiently eliminated. Now we have 
RES EN б: BI 


MEM жж 
РТЫ я yt — 0.3789 


СД 
: Vo — 0,7255** 


dz x 


and = 0.9236 
where N — 23. Using the percentage points in Tables 1 and 2 which correspond 
to the circular statistic V,,,/V,, as approximations to the percentage points of the 
noneircular statistic Vj,,/Vj, we find that Vi/Vy and V3/V{ are significant at 1% 
level but V$/V;isnotsignificant at 5% level. Therefore, the systematic part may be 
considered to have been Е eliminated їп the finite difference series of 
order Ё — 2. 
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LIMIT DISTRIBUTIONS OF THE CIRCLUAR SERIAL 
CORRELATION COEFFICIENT 


By ROY LEIPNIK 
Michelson Laboratory, California 


SUMMARY. Limit distributions of a renormalized circular serial correlation co-efficient from 
a circular version of a stationary Gaussian-Markov process are derived from expressions obtained by 
Madow (1945) and Leipnik (1947). The limit caleulations are validated by summability argumenta. 
Characteristic functions and moments are found. Iterated limit distributions, as correlation tends to 1, 
are also discussed. S 
1. INTRODUCTION 


The distribution of the serial correlation coefficient from a stationary Gaussian 
Markov process is extremely complicated (Leipnik, 1958b). The circular (periodic) 
modification, while somewhat simpler in appearance, remains rather opaque. А 


smoothed approximate distribution, whose first (3 ) moments are correct, has been 


derived (Leipnik, 1947), but its overall accuracy is questionable. 

The differences between distributions are often illuminated and sharpened 
by the study of limiting forms. A study of limiting forms of the circular distribution 
and the smoothed circular distribution seemed justified. The formal results were 
easily obtained, but justification of the steps was hard to find. The necessary tools 


were found in Hurwitz’s theory of the consistency of ‘Toeplitz methods of summability 


with Cesaro summability. 
9. Тнк UHLENBEOK-ORNSTEIN PROCESS AND ITS CIROULAR MODIFICATION 


For 0<p<1, consider the Gaussian process {X(t), 0 < t< оо} such that 
ЫЩХ(һ)Х(ь)] = 01861, ELX()] = 0, known as the Uhlenbeck-Ornstein process. It 
is stationary and Markov; and conversely, the only such one-dimensional processes 
are of Uhlenbeck-Ornstein type. For each h > 0, the process {X{, XP, ..., Xj), 
X} = X(jh) is a discrete stationary Gaussian-Markov process. The serial correla- 


tion coefficient 
п. 


$ xp хүр 


PiE 
у ( xps 
j-0 


E n-1 
(sometimes defined to have $ хо x (Xj?) in the denominator) has 
Although the Von Neumann statistic 
of p, still 7? holds much 


Leipnik, 1958h). 


been extensively studied as ап estimator of р". 


X a — Xj? is now recognized to lead to better estimates 
j=0 Е 
interest. Unfortunately, its distribution is excessively complicated ( 
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This difficulty is reduced by introducing, after Hotelling, a periodic (circular) 


process Х® and associated coefficient. To arrive at these, note that X(? and 
Xj?,—p^X(? = Vf? are independent Gaussian with mean zero, 


and ЕТ)? = o*(1—59?^) for j = 0, ...,n—1. 


Now define 7, ..., F to be independent Gaussian random variables with mean 
zero, 


BVP] = 050, р, в), ў=1,..‚%, 
and define the Xf? by 


Šp Př = Vp, j—0,...,—1 and XM = XM, 


It follows- that Xp = 


1 "E P plU-#+n—1)moanyp, 
1—p" ро 


Direct calculation yields 


EŠ XM] = ч an (ОН т-13-М% ) for 0 < j, k < n—1. 


If in particular oh, р, h) is taken to be g*(1—p2?^), 
v V а? j- n= 13 — Е|)А 
then ЕХ XP] = Ip" (pci EEEE), 


a convenient form for most purposes. (For study of p— 1, another choice of oh, p, n) 
is preferable.) 
The circular serial correlation coefficient 


a a P а; ~ 
® XP ip Xe. 


Hp — ico 


- "> (x " " 


ј=0 


is an estimator of p^ with a simpler distribution (Madow, 1945) than that of r(". 


“The continuous version of Хю is defined as follows: {Ñ} (t), 0<t< T] 
is а Gaussian stationary process with mean zero, and autocorrelation 


ГЗ ~ 2 
EHX p(t) X4(,)] = I (pls—&l-pT— [t| ). 


Thus H(X(T)—X,(0))"] = 0, X,(7) = X,() 
with probability 1. 
Note that as To, the above autocorrelation tends for p <1 to o?p|ts —4 |, 


the autocorrelation of the Uhlenbeck-Ornstein, process. Clearly, for given п, h the 
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тшш variables X(9,..., X(?, ean be recovered from Xp(t) by setting T = nh 
and Xj» = X, (jl), j = 0,...,n—1. Henceforth this embedding of (X09) in (5. 0} 
will be assumed. It will also be convenient to set £ = log p, so that 


Е) = 0 
sinh zu 


3. DISTRIBUTION OF 7) 


Madow (1945) showed how to derive the distribution of 700) (see also Lehmann, 
1947) which after simplification becomes 


4 exp (=F) sinh ed 


VEU ИЕ, (3.1) 
Щ1—ехр(—/)) 


Gag (=) = PP a) =1- 


(n-3)/2 
Р@ x 1— cos 2 „дешы =g 4 
X(—1)*- cos м 
kel ^| - cosh (Bh) —cos т cosh (#Ж%)— 


where p(? is the first non-negative integer j such that 


eos [Элу 2c сов эт. 
n n 
An approximation Ghaap to бъља with the same first E) moments is 


T 
Gy n,0(2) = ( i i 


АВ (Leipnik, 1947) (85) 
E ir) 


Та пу”-олаехр (—2/)—2t exp (— fh)". 
A 


Henceforth (3.1) and its related forms will be known as exaet, whereas (3.2) and its 
related forms will be known as smoothed. 


4. A SEQUENCE OF ESTIMATORS OF p” 


There is a way of renormalizing a sequence (u,) of statistics which may force 
it to converge to a constant. Suppose the distributions of (u,) contain a parameter 
у, and that и, is, in some sense, an estimator of o(y, п). If f(a(y, n), n) = у, then we 
say that (и, n) is an estimate-normalized sequence of statistics for y. Of course, 
this terminology is only suggestive. Since 70? is an estimator of р? = exp (—fh), 
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the sequence @®) is an estimate-renormalized sequence for exp (—ZnÀ). If now 
Xp is imbedded in X,() with 7’ = nh in the manner shown in Section 2, then {s,,} is 
an estimate-renormalized sequence for exp (—/T) = 


where 8, = (riy, oe (41) 


Since r(? is not a positive random variable, s, is positive only for n even. 


If Ey,2,6y) = Риз» < у], 2. (4.2) 
0,9y «0 
then for n even К„тв(у) = { 
Gorg (y )—G rs ely") 0 < у І 
G, —yy^, 4 0 
and for odd Kyra(y) = { naka ie 
Grp, pj"), 0<у<1 


0 for n even 
In particular К.т, (0) = 
Gy BU) for n odd 


If ш Ky rp (0) = 0 and lim Gama (У\") = Kr p(y) exists and is a distribution for 
0 < y < 1, then Кур will be the limiting distribution of (s,). 

Similarly, a smoothed distribution Karie сап be defined in terms of б, тв. 
Tf lim K,,ng(0) = 0 and lim Gurme (УМ) = Кив(у) exists for 0 — y <1, and Кир 
is а distribution, it is the limiting smoothed distribution. 


5. lim Слив (0) = 0 
n 


1— —4exp(— 2 7) sinh ð 


Let езы оу=- A (5.1) 
n( 1— exp (22) 
1—cos zak cos mb 0 
nla Iu nk n 
EC» cos = oak 5} 
cosh — — cos d ‘cosh — 
n 


where à — = 5 


Formally taking 1 оо, writing E (—1)*+ = 1, and using the identity 
=1 


RS (11 TE 2 
А A he (1 sinh 2 ) 


[Whittaker and Watson (1948, p.136)] we quickly find lim бы = 0. The 
justification obviously calls for a шша argument. 
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1f T = [trn] is an infinite matrix, the ‘sequence [э a, | is called the 
Toeplitz sum sequence of the series E а, and its limit, if it exists, is called the 
Toeplitz sum (T) PES In particular, if t m 1—kn leg kbkegmt,-—0 bm, 
the Cesaro sum is obtained, denoted by 


M 2 


The result (С) (ЕЕ 
Е=1 


is classical. "The (T) sum is said to be consistent with the (С) sum if т 
ы © © 
(T) X ag = (0) X ag whenever (C) X ак is finite. 
, K-1 K-l Е=1 
The Hurwitz conditions (see Moore, 1938) for (T) sum to be consistent with (0) sum are 


(Н) Him (g,—0; (Hs) Ёш try 15 (H) sup E K|Metgal < 9, 
Koo K-*- -n K-1 


where Айк» I tron кая hne 
тк \®, n—3 9 т 
Let tra = (00877 ) A mte (1): tas o K> (1). 


(H,) is obviously satisfied, and (H;) is easily checked since 


: lim tza = ( lim. (cos ЛЕ y" Y 


п ә 0 


апа p. 


«(erre 


As for (Нз), note that 
# (oos ах) п = a?b, (сов аа): [(b,—1) sin?az— cos?a«] 
- 65%, Thus (cos az)» 


71 sin b=} and negative for ж < a7 sin 
z) for b >l. It 


is positive for v > @ 
is convex in (0, a sin 5,3) and concave in ( a> sin! bpt, a 
follows that there is an integer dn; 


ld sint (,)*| < 1, 
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b 
such that A? ( cos 21) "< O for 1 & K «d, 
bn 
an A? ( cos 27) > 0for d,<K<("). 
Application of the partial summation formula 
Ey 
Ик, тә (К1)А 1n K1Atg, ntr, ions t, +1,n, 


easily proved by induction, yields 


Р К |А ox K AL d, | A* bu К At 
ENT t TN 

Kel | x, al imb K, E al enl SPUR K, 
= ФАЧ, „|М, Ад, „ии в „1 


< 24, шах [|At, „|, 14, 1-3 


dian 


since |tz „| < Lforn > =. Clearly, bothof these first differences are bounded above 


bami 
by the value | i (cos ax)bn| at ж = a~ sin 53, which equals abi (cos sinc! b>)". 
sin by? n 8-1 
Hence d, max ПАС „|, [AC, т < S (1—571) 
which tends to 1 as n— co, and therefore is bounded proving (Н). 


Now consider the Toeplitz matrix defined by tin = ( cos E ) gw» LE {ax} 


is bounded, then 
нү; 5 LnL— Q4) 
Him | 2 (@ te) ак| < sup Jag]. Шш У (1—cos 7E 


Note that 1—cos 27a[ exp (2722)] is 0at д = 0 and its derivative 


2л cos 2g exp(27*2?)(tan 272—272) > 0 for x < 2 


Hence cos 272: < exp (—272?), for = < т" 
For ауаз к С (2) › COs s < cos 27::-5/12: < exp (—2л?п—5/6), 
/a 
(cos 22 y < exp (—7?n9$), 
т 
so that D кл. (1-Е) < “ехр(—л?л1/%)(сов 2mn-5/12)-3/2—0. 
Куут! ^ 4 
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On the other hand, 


(con J, 


50 У tx, „(1—00 2 ) 2 а ( cos с ( 1—cos i ) 


272K? ү" mk 


1 492-208 
К < тпа ( т? 2л? 


SmTK? TK? 
105 Ета ec M pes 7 
< К< nue ( + т? ) 2n? } 
< br*n4--1571& 9? for n> 5. 


Hence lim | 3 (кліка) ак = 0, 
n | Kat 


and the modified summability method is also consistent with Cesaro summability. 


(na) А 
Thus lim (ИК, —(0) $ (-DE-! = 4. 
п Kel K-1 
Now 
eS 2gK j 
(nla) я Ge n Z (n/a) „Жу 28 
kai Е 3 NET 2 
УЕ. 3 эк |= 2, (CU ten ( cosh = 1)" 
cosh'— — cos 
т 
(п/а) у 28 22K \ 1-1 
K-1 2 —— —— . ex (9. 
х2 c fg, | 2? ( cosh 5 вов — )] poa) 


At this point it is helpful to invoke a lemma : 
Lemma: If Pr; fgn i$ a non-negative double sequence such that for some 
No, frm < farin for n> Т 1 < К «n, ры no = fg exists for all K, and $ 
(—1)E-1 fg converges, then 
р E 
lim X (=E= fg, = E (707 fr. 
no К=1 К=1 
For the proof, see Leipnik (19582). Now take 


-.., 20 22K \7? 
Tea ign [v^ ( cosh > — eos Es )] 5 
and note that for n > 3, fz, is à product of monotone non-increasing expressions in 
: 28 27K | E 
К, that p, = (n/4)00, that ina fin = lim m? (cosh 9 — сов?" =(20%--2л2К?)-1,, 
and that Я (—1)E-! (202-272)! converges. Hence (5.2) converges to 
K=1 $ : < 


сыену чы A 
- | 1—0? a фл 2sinh O° 
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Comparison with (5.1) shows that G,; converges to 


д 
1—4 sinh à Ue oF (za) = 0 
2sinh д n —28 дү Р 
*(1—ехр( —®)(созһ°) ) 
as stated. Since „тв is a cumulative distribution, it follows that lim G, rng 
n Эсо 


(—&,) = 0 for all non-negative sequences (£,]. From (4.2) lim К„тв (y) = 0 both 


for n even and n odd, and lim К, рр (у) = lim Gamme (УМ) for 0< y <1, if the 
n n 
latter exists. " 
6. LIMITING EXACT DISTRIBUTION 
The analysis of 
1—4 exp ( =) sinh à 
Grim p") = 


1—8)/2 
Эл. ("—3)/ 


cosh ш 1 © 
ле numi 
28 2nK 
cosh ——cos —— 
n 
is simpler than that of G, ;. Recall that 


1 > cos ( “ХОЛ ) 2 yl" > cos = (py(y!*)3-1) 


so that "t руч") < cost уй" < 27 (рит) 1), 
Now n cos-!y!"—oofor 0 < y < 1, since lim cosy” dm Y 108 E BA 
290 X a0- l—y” 
gfo 2K дт (n-3)/2 
Also, cos es 7t 
ш cosh 22 —yun 
n 
А И cos ER an ( cosh ES ) 
and: сов ТК n Я т 
? \ cosh г уч" ( cosh 28 сов Е ) 
л п n 


are decreasing in K and tend respectively to 
82-472K2 да лак? 82 
е: а —— —— | ж эр. 
x»( log y ) E MP ( log у ) 47k? 
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Application of the Lemma to both parts of the above expression yields 


K,(y) = lim Gnom Bly") = 1— 2 sinh à 5 pza mE BHK?) . 
à im. Gn rm By”) Ў (— s sis iuo ) 
foro pe 2. (6.2) 


Trivially, K,(1) = lim G mmp(l) = 1, so that К, is indeed a distribution, continuous 
except perhaps at 0,1. Now 


1 2 sinhó 5 $c ma л?К? Е SUM 
M, (e) = K,(expz)3) = Ч mise 
l62«0. 
Th $ ЛАК EAE = E 
Since EU Ln d (72 K34- 82)? = (т2К24-52) 0 < 6—8? 


for each C > 1, 


it follows that summation over К and limits as z— со can be interchanged. Hence 
lim — K,(y) = lim M,(z)=1. The continuity at other end is more difficult to 
и — 1" 2: со 


obtain. 
Consider the summability method defined by lim b аке ? K'.. Garabedian 
2204 K=1 
(1931) showed that if 


lim Ак оо and for some a> 0, lim e E 
K> log r>% К’ 
$i © y LJ 
then lim E age =(0) X ay. 
20+ K=1 K- c 


Now for Ак = z*K?, these conditions are certainly satisfied. 


В $ (ТЕ TK = т2Е°2 
Hence gn i2 (—1) лака е 
TK? 5 
K- = 
ed E (-0* prank үтү 
and thus lim M;(z)— dum Ky) = 0. 
z—> 0+ or 


The c.f. of М; is not hard to obtain. Let 


sinh д qx. S 


Флаг T e DE iia —1)Ё=1 
xt) = Г ee ам ә) = lim fae ам) = 2 —— іа, (—1) 


x TK? e—s(m2K* 4- $2—it) , 
л°К?--8%—й 
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By an argument like that of the last paragraph, 


Tt is easy to show that 


B 


x(t) = ED 


sinh à — J/ói—it 
sinh 4/92 — jt 


Since 


clearly, 


2 (1—7 1—8 Cua 


Ши (07) = eite , m. (6.3) 
ô — о 


Thus if и, is а random variable with distribution K, and v, = (exp v;!)-!, 
then v; is a random variable with distribution М, о and Ws = То, has a c.f. which 


tends to e!'/8, the c.f. of 3. as T — oo. Hence in a weak sense (r(7/9)"/7 is an 


estimator of p = e-? for т and T large. 


The distributions К, and M; are reminiscent of the Kolmogorov-Smirnov 
distribution, though slightly more complicated. The mean and variance of М; are 
easily calculated from xy; as 


1 


20 


(сти 


1 


à 


EE Кеш Fink) в 1). 


1. lim Gamme (0) = 0 


А smoothing approximation Gag to Guage was derived by Leipnik (1947) 
and Quenouille (1949). The expression is 


п 
iE a pe S а)" рә apes, is (11) 
r(z)r(2 z) = 
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To find the limiting renormalized distribution, the procedure of Sections 5 and 6 is 


imitated. Note that < 
= ) +2 ехр( d t js dt 


"n 


1 
c" f ауто (146 
Gus = Gage ee (| 


17 E 
2 f (1—6) "-0/2dt 
0 


во that lim G,; — 0. 


8. THE LIMITING SMOOTHED DISTRIBUTION 


== = = у 
As before, let Kns (y) = б.т (y^) = Gnat J ¢,(t)dt, 


n 
where (t) = rre (1—g2)m»2 
"auri s) 
x ( 1+exp( -2 ) —2 exp( -2) a” T E (Чг, ср (8.1) 


A limit theorem for Lebesgue integration (Scheffe, 1947) states that if ø is 
non-negative measurable on [0, 1], lim ф, = ф exists, 


lim fé (Ddi = [ф0@ = 1. 
во rj 


then д lim 1 фа = jood 
for each measurable set A. 
Z p; 1 
Now ; бта (1) = 1 Gt I ф„(@@, 
= 1 aay 
80 1 = lim Gna Him I Ф002 = lim [ $.(t)dt. 
: ‹ i 
lim saa tee —1 lima 1 
E ACRES 
Е 2 2 2 А 
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for i> 0, lim (2n)-# (1—8/)- = (—41ogi) for 0 ct — 1, 
H : 
lim ‚1—@” n/a 
45 28 n 
t ( L+exp( 22. )-2 exp (= FF) wy ) 
3d 1—" n/a 
=i = = 1982 ор 
LR [ (2) (=з e) | eB гез“ ost, 
Sf na Mi + e8?/log t е (8.2 
во that $t) = іш ot) = ЖИ" 4(—log t)-* eà?llog t , (8.2) 
Note that 
| 20 703 >н — exp (J(1— /4pJ-1) 
dex —з— 4 
J etat po [т exp( — эт ) т VERI 
80 Tota ZAN 
and the p-th momentof K,(y) = SA ] 1-1 (—log 0-4 es2/tog t dt ... (83) 
" exp exp (J(1—4/4p-F1)) | 
d ИЕ 
The transformed distribution 
Mly) = Е, ((ехр у-1)-1) = A f 8-3% ехр( —„—9%%) ds (84) 
0 
for 0 < y < о. 


The c.f. y; of М; is easily found as 


M) — De exp( E —ityr*) dr = exp ($ ( 1-,/1- 4) ). 


(8.5) 


The mean and variance of if, are 5 апа = respectively. 


Clearly, X, iT = à iz LÀ T. i 
y, XT) exp ( (1- i$) ) (F) a —эсо 
Thus although y, and x, have а quite different appearance, the limits as '—эсо 
of x; (#Т) and ҳ (tT) are the same, the c.f. of the constant т 
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9. LIMITING DISTRIBUTIONS AS 0— 0 


Formally, the above distributions and c.f.’ s have limits at 0 0. For the 
exact distributions, 


Ky) = lim Куу) = 1— -:Yi- 13 exp (=) 


log y 
_, —TAK3y 
My) ас е Ў 2s (9.1) 
vit | 
DES 
A) sinh4/—it D 
For the smoothed her ate 
Куу) = ve el ТЕК log tdt 


(9.2) 
Maly) = T Ert exp (ч, ) as 
Xoli) = exp (үй) 
Curiously enough, М, has moments of all orders, while the p-th moment of M, wt ) 


for p < $, infinite for p > $. It is amusing to note that 


Roly) = ys = Z] enean 
is the positive normal distribution. 

There is a reason to expect such limiting distributions to exist as 0 0. In 
Section 2, a set of independent random variables of variance oh, p, n) was introduced, 
and a related set xp of random variables of autocorrelation i 

ü a am gi (pA peii) 
was defined. For simplicity, the agreement Th, n, p) = 0%(1—p™) was made, and the 
distribution of 700), was written down. Since 7) is homogeneous of degree zero, its 


distribution is independent of the choice of oh, p,n). If the choice c*(h, m, р) = 


125 
{Х 1(Ф)} becomes 


т\ш i: ) is taken, so that tho autocorrelation of the corresponding continuous process’ 
p 


25 A x It5—5] + pT 16—01) 
и = у 7 


then вре = S арии, 
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Tf in the last equation the “limit” p—1 is taken, the heuristic result is 


El(Xp(te) —Xp(t,))?] = 0° |1— |, the defining condition for Brownian motion. But 
for T' fixed, p— 1, if and only if 60, so that the limiting distributions as ô— 0 
are formally related to a non-degenerate process, namely Brownian motion. 

The author is indebted to Professor T. Koopmans for suggesting the problem 
in 1946 and for his good advice, too seldom taken. Helpful remarks were also made 
at various times by Professors Arthur Livingston, Herman Rubin, and Robert Tate. 
Much of the work was done while the author was employed as a fire lookout for the 
U.S. Forest Service, an organization to which he owes many thanks. 
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STATISTICS PROPOSED FOR VARIOUS TESTS OF HYPOTHESES 
AND THEIR DISTRIBUTIONS IN PARTICULAR CASES* 


By О. P. BAGAT 
= Punjab University, Chandigarh 


SUMMARY. We find the distribution of the six statistics in multivariate analysis of variance. 
The distributions of the statistics for the cases 1 = 2, 3 in the form of definite integrals and the limiting 
distributions of two the six statistics for the case 1 = 2,3... 6 in the form series, are also given. 


1. INTRODUCTION 


In multivariate analysis of variance (Pillai, 1954) the three tests of hypotheses, 
(I) equality of two dispersion matrices, (П) equality of the p-dimensional mean vectors and 
(III) the independence between a p-set and q-set of variates depend, when the respective 
hypotheses to be tested are true, only on the roots 0; or $; (i = 1, 2, ..., І) respectively of the 
determinantal equations 
|A—6(A--0)| =0 ОШ) 
and |4—20] =0 sv (L2) 
where А and © are independent sum of product (S.P.) matrices, based on sample 
observations with т; and ng degrees of freedom (d.f.) respectively and can he defined 
differently for different hypotheses. 


The common standard form (Nanda, 1948; Roy, 1957) of the joint distribution of 
the eigenroots of (1.1) under the respective hypotheses, is as follows 


1 i 1 
Сбт, n, 1) TL бг (1—6 П TE (00) П dO, ve (13) 
i-l i=2 ј=1 i=1 
for а T l = min (p, т) 
E п г(2" енн 
апа Ст, n, = = 75 a EI ^ (1.4) 
S {2 „ M: i 
п Ti 18 
uus ee rs 
and the values of J, m, n for the respective das are as follows 
(1) ф=р,т={(\—р—1), п = {(ь—р—1) зе (1.5) 
Ш) If p&m;l-—p, then m = 4(n,—p—1), n = 3(n4—p—1) 
and if pm, lm, then m = p—7,—1), n = 4m —p—1l) .. (1.6) 
(III) Same as (П). і vas (1.7) 


The common standard form(Hsu, 1939) of the joint distribution of the eigenroots 
of (1.2), under the respective hypotheses, is as follows 


1 11 1 
буй dp адин п П (id) Hd ow (8) 
for 0 S ĝi S $a S -o < Qir < © 


where 1, m, n and C(m, n, 1) are defined above as іп (1.5), (1.6), (1.7) and (1.4) respectively. 


*A part of tho thesis submitted in partial fulfilment of the requirement for the Ph.D. degree in the 
Department of Mathematies, University of British Columbia, Vancouver, Canada. 
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Nanda (1948) gives the limiting form of (1.3) by setting 0; = C;[n and then letting 
тоо. The limiting distribution is as follows 


1 1 l 11 1 
т = = : ‘ 
КІ, т) п C? exp [ 2 «| d TR [^5] TE ас; ... (1.9) 
l [2т--1+-1 i 
2 — ... . 
where К@,т) = т] пт" ЕЕ г (s) (1.10) 


and again J, m assume different values, defined above as in (1.5), (1.6) and (1.7) for the res- 
pective hypotheses. 


2. SmATISTIOS PROPOSED FOR TESTS OF HYPOTHESES (I), (II) Ax (ПТ) 

We list below the statistics based simultaneously on the roots of both the deter- 
minantal equations (1.1) and (1.2) which can be used to test the hypotheses (I), (II) and 
(ПТ) with the suitable choice of the independent S.P., matrices A and C. 

(i) Воу’з statistics of largest, smallest and intermediate eigenroots based on the 
determinantal equation (1.1). We can simultaneously propose to include that of the eigen- 
roots based on the determinantal equation (1.2). 


4 (i) Hottelling’s T statistic defined as follows : 


1 Л 1 
7j = m tr (0-14) = 2 ( i) = m È $. 


=1 


(її) Wilks’ A statistic defined as follows : 


Ass 


(0l — па 0) = R46). 
ГАС] ia , i-i 
(iv) The Wilks-Lawley U-statistic defined as follows : 


U= |4|/|440| = n (0) = п ( .À. ). 
i=1 [SETA 


(v) Pillai’s V-statistic defined as follows: 


V = ҥ(4+бузл]= È (0) = а ; 


(vi) Finally, we propose another statistic Y defined as follows: 
1 1 
r4 ert = Я 
16| iai 1—9 ле # 
Of course, the distribution of any of the statistics, under the respective null hypotheses, 
сап be found from either of the joint distributions (1.3) and (1.8), but it will be more conve- 


nient to use (1.3) for finding the distribution of A, U and V, (1.8) for that of 7? and Y 
and either of the two for finding that of Roy’s statistics. 


Nanda (1948) gives the joint limiting form of (1.3), which we have listed under (1.1). 
Following him, the joint limiting form of (1.8) is easily proved also to be the same as (1.9) 
by setting ø; = C;/n in (1.8) and then letting »—00. This fact that the joint limiting forms 
of both (1.3) and (1.8) are the same enables us to conclude that the limiting distributions 
of the statistics Y and U will be the same and also that of T$ and V except for the constant 
multiplier. The same can be said in the case of Roy’s statistics. : 
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No great headway has been made so far in finding the distribution of the various 
statistics defined above. The classical Т? is known (Rao, 1952) to be distributed, under 
the null hypotheses, as central chi-square with тур degrees of freedom (df). In the case 
of non-centrality parameter т? z^ 0, the classical T?is a non-central chi-square distributed 
with түр df. The exact distribution of studentized T? for both central and non-central 
cases is not known in compact standard form. Ito (1956) has given, under the null 
hypotheses, its approximate distribution as an asymptotic expression of chi-squares each 
with mp d.f. 

Wilks (1932) and Nair (1939) have given the exact distribution of A for n = 1, 2 
and any p, and for p = 1, 2 and any n, by comparing the moments of A with those of F- 
ratio. Bartlett (1938) has suggested a useful approximation whose use has been made by 
Bartlett (1947) himself and by Rao (1952). More recently, Bannerjee (1958) has been 
able to give the exact distribution of A in series, but its tabular values are not yet available. 

Roy (1943) and Nanda (1948а) have worked out the limiting and non-limiting distri- 
butions of the statisties—largest, smallest and intermediate eigenroots of the determinantal 
equation (1.1) to test the hypotheses (I), (II) and (III). Their tabular values haye been 
given by Pillai (1957) for the case, Z = 2(1)5, m = 0(1)4 and n = 5 to 1,000 both at 5% and 
1% significant levels. | 

Pillai (1954, 1955, 1957) has succeeded in giving an approximation to his statistic 
V and has been able to tabulate it for | = 2(1)5, m = —.5(.5)5(5)80, and m = 5(5) 80. 
Nanda (1950) has also given the exact distribution for the special case when m — 0. 

We have taken, in Section 3(a), the statistics 72 and Y and have been able to give 
their distributions for 1 = 2,3 in the form of definite integrals. Since the procedure is 
quite similar for the remaining statistics, we are not including them here. For them the 
reader may see Bagai (1960). 

Finally, in Section 4(a), we list some integrals (published elsewhere). In Section 4(b), 
we have first found a new form suitable for finding the limiting distributions, for 1=2, 3, 4, 
of the statistics U or Y the form of series and for Ё = 5, 6 in the form of double definite inte- 
grals. This method can further be extended to any value of 1. 

It may be noted here that we have worked out another method of integrations 
different from that of Nanda (1948), of finding the limiting distribution of Roy’s statistics. 
This method works very well in the case especially of the smallest eigen values. This new 
method of integration has further been demonstrated (Bagai, 1960) by solving some particular 
cases, giving various values to m, for l = 2,3, and 4, Since the problem has already been 
completed by Roy, Nanda, and Pillai, we are not including our method in this publication. 


3, DISTRIBUTIONS OF VARIOUS STATISTICS FOR | = 2, 3 
(a) Distributions of Тї and Y, Case I;/=2: The joint distribution of фу 
and ¢, from (1.8) is, ў 


Olm, n, 2)(фь PHNH)” 306—9) 401 Wo © (81) 
where х 0<#, Ф < 9%. 
(i) For Y-Statistic, let $192 = u, (13-94) (12- 03) = v vee (3.2) 
во that (94—9,)d9; dha = du dv 
this changes (3.1) in the following form 
O(m, n, 2) when) dudv, wee (8:3) 
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. Now the roots фу, фо of the quadratic 
i sp z?—(v—u—1)z--u = 0 
are real if : (v—u—1) > 4u 
| le, if | (1+ Vt)? < v. 
Then the limits for v and v are given by 
(+v) & v < vo 
and 0 <u<o. 
The distribution of u(= ф, фз) or Y is given by, 


O(m, n, 2)u™du 1 "= 
| ve (1-Е Nu)? 
where 0<и< о; 
20(т, п, 2) п)" т 
у mae qa укка» V) 
where 0 € u < o. 


(ii) For Ti-statistic, consider now the change, 
Ф.Ф = и and фф, = v. 
Proceeding ав above, the joint distribution (3.1) takes the following form : 
е Olm, n, 2)" (1--u--v)-"-"73du dv 
where 0502$ and 0«u < о. 
Then the distribution of ш is 


u2/4 
Cim, т, 2) | v"(l4-u--v)"-"-3 du dv. 
v=0 
Setting v = (1--wu)V,, we get, in place of (3.9), the following distribution 
u[4(14-u) 
€(m,n,2)1--u)"-du f Vo(14- V9)" dV, 
Vo=0 к 


where | 0 € wu « о. 
Case ЇЇ; Forli = 3. The joint distribution of фу, фу, фз from (1.8) is, 


бт, n, ЗОН" П П 6). Пд. 


where 0$ << в < о. 


(3.4) 


(3.6) 


(3.7) 


(3.8) 


(3.9) 


(3.10) 


(3.11) 


For finding the distributions of both the statistics Y and T? for three k = 3 eigenroots, we 


effect the following changes 
bit Gots = и, $1 фу4-фуфа-_Еф»@»в = v, and diu. = w 
so that (ФФ: Фа фай = du dv dw. 
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a HET 


TESTS OF HYPOTHESES AND DISTRIBUTIONS 
Then the relation (3.11) reduces to 


C(m, n, 3)w™(1+ut+o+w)- du dv dw ... (3.13) 
where фи, фз, фз are the roots of the cubic, 
B—ux*+vr—w = 0. .. (3.14) 


(i) For Y-statistics, in order that the roots of the cubic (3.14) are real and positive, 
we can (Bagai, 1961) write the limits on и, v and w as follows 
0<w<m and 0<w<m 


But? < о < WAV) PANH) L v <a, 829) 
Ba SU p; В Xu 
where f, = max [ 8v, m [94m Foni cos 245] 1. 
fom i re) ө» 805] 
pice а [а eni Jn р 
and A = ub [2 (25 ) con 4% | | © (8.16) 


where ¢, is the supplement of ¢, and is defined by, 


EUR e 9и) 3/2 M 


tan ф т 
392 — — 

2 (216! 200%u*— и ) 

for ( 216+ 2002-7. d being positive. 


Thus, the distribution of w(= ¢, P фз) or Y for 3 eigenroots from (3.13) is the following 
Ст, n, 3) w^ f. f (12-u-I-v--w)-"7"-* du dv dw К е (818). 
vu 5 


where u, vand w are defined in (3.15). 


PEE the following change in (3.18) 


v = (140), w= (14) У) 2. (3.19) 
so that du dv — (1 UR UO dU,, we get in place of (3.18), the following 
үш, ve (8.20). 
Oom m D) erroe ff, FTR FM 
for 0< w-o and Oxw < о. 
3102/8 32/143) 3012-3)? <лу<=, 
ire Ур are Тр 


в ru DERI: f. 
ет) < © (тийїї) (әйт) < “' < 17) 
Bas B'a and f, are defined as in (3.16), and u used in them is equal to (14-w) Vy: 
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(ii) For T?-statistic, in order that the roots of the cubic (3.14) be real and positive, 
we write down the limits (Bagai, 1961) respectively for и, v, w as follows 
O<u<o and 0<и<о, 
Oc¢v<iw and [Losje 
о<ю</, and [w< fo 2. (8.21) 
where f, and 2, are defined as in (3.26) below. 
Thus, the distribution of u( = ¢,+¢,+¢,) or T£ for k = 3. 
Eigenroots, from (3.13) are the following 
Olm, т, 3) f f w"(1--u--v--w)-"-^-* du dv dw me: (9,22) 
vw 


with limits for ш, v and w shown in (3.21) 


Effecting another change in (3.22) as follows 


v = (1-u)Vs, w = (1+u)(1+V2)U, ... (3.28) 
we get the distribution of и = T? for Ё = 3 eigenroots from (3.22) as follows 
У U?dV,dU, a9. 
б») тузуна, acea унн uu 
where oe and 0<¢u<o, 
u? u? 
Oo i me атте 5 ** т, 
ocu CC E mc А _ 3.95 
S 7$ тк) Казйр“ 7 < рәјә c 079 
where f, and f are respectively defined below as 
1 : / "n 
В: = Ru (9-5 и) — зо and f, = ED (v 5 из) } 5 (u*—3v)9/ 
(3.26) 


and v used in them is equal to (1+-u)V,. 


4. LIMITING DISTRIBUTIONS OF U ов Y ror | = 2(1)6 


(a) Certain integrals. We first list the following integrals which are made use 
of for finding the limiting distributions of U or Y. 


(i) We make use of the Legendre’s duplication formula for the gamma function, 
namely of 


(ва-л) = Mr T Ont) m d 


(Ш) А definite integral (Larson, 1948), namely, ` 


2 x 
J exp [—2?—ax-]de = үт exp (—2a) у ... (4.2) 
for a> 0, is made use of quite frequently. 
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(iii) Now, we give the following two definite integrals (Bagai, 1960) which we have 
evaluated ourselves for the purpose : consider 


(a) T=4 fe exp [—2(z--aa-1))dz, 
then 


+[1—+@. ym aen E oper et Ht] 


(4.3) 
where y is the Euler's constant. 


(b) L(a)—2 IE exp(—a?—aa-1)dz 


2 92 (4 з в 1 
L(a) = [y—log a} [2.8 diga 8 vf 


+[1+ 29 Gey a cob gg tite tH ee | 


p 
= DEEP а, 105% ал. 
х, a i Es Jtr 
v Ё Tatas 8L 13870 7 ] a 


where again y is the Euler’s constant, 


It may be noted that in place of definite integral (4.3), we can also, by letting r = 2, 

р = 2 and L = 8a, use the following integral [Bateman, pp. 146(29)] giving the use of modified 

Bessel's function whose tabular values are readily available, i.e. for both p and L as positives, 
ir 

fra aL ) =2 (2 \ кут 

[ Г exp( HR di —2 ( т ) (v bp) 


where Ky = T. L0 9 vee (45) 
а p 
pe Vi uer [ mll(r--m3-1) ] 


Distributions. The moment generating function of (1.9) is, 
о ә 1 1 1 ta 7 P 
mit) = |... [ KG, m(0,0, -. бу" oxp [-2 они с] П П (0-0) П dc. 


from which the k-th moment ji about the origin is, 
2m-4-i4-1 
1 , (x) 
DU M ae | 
Ph Kl, m+h) а pee ) 
2 
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This h-th moment shows that the moments of the limiting distribution of the product of 
the roots (0, O, ... Cj) can be determined from the following 


exp (—v) тещи, . ВХР (=) отр, ... косы отно, 


2mF2 Im+3\ 5 2m4-14-1 
(==) r (7) тз) 
1 
exp (—X vj) dE 
or from cc TFI onm, ems "n dv; 1.10.6) 
T(m4-1)r (n) eae (mt) 
where 0€ v; «o, a E A A 


Now, we determine the limiting distribution of the product (О, C, ... C;) of the 
roots for particular values of /, For I = 2, the result (Rao, 1952) is already known. We 
have also (Bagai, 1960) determined independently that 20,0, is a gamma variate of 
parameter (2т-|-1). 
For l= 3, substituting 1 = 3 in (4.6), we get the joint distribution of v, v, and 
vy ав follows 
Ca vp o3 ody dy do, s (4.7) 
r(n--)r (m+) гоно) 

where 0 € vy Vg, Vg < о. 

Making use of (4.1), the joint distribution (4.7) takes the following form 


2т+ exp (—v,—v3—vs) mah ml vee (4.8) 
Va Tin 5 1 т andes, E 


Setting 4v,v,v, = V2, 08 = V2 and va = V3, (4.8) reduces to 


d UA ИБ ViV$ Vi. 4.9 
Vr Г ЕГ ЕУ exp ( aT" Yi)av, av, ar, ... (4.9) 
where 0€ V, V2, У, < o. 


- Integrating (4.9) with respect to V, with the help of (4.2), we obtain the joint 
dist; ibution of V,, V, as follows 
4yinaüy, 
T'(m--1)P(2m4-3) 
where 0 € V, V, «o. 
Again integrating (4.10) with respect to V5, we get the distribution of У, = 2V/v,v,v 
(or = 24/0,0,0,) as follows : 


exp (у "Чат, av, 2. (4,10) 


2yina 
Гот 1) (2m 41-3) 
where 0 < V, < о and L(Vi)is defined in (44) by setting а = Vj. 
For l= 4, substituting | = 4 in (4.6), we get the joint distribution of, v, vy v; 
and v, as follows 


тат, ue (411) 


4 
=> 
oP ( 2,9) Чр (4.12) 
3 BiU UE opt. of eL v ... (4. 
Tin) P (m )rn--r(n-- 5 ) 
where 0 < 0,03, Vg, v, < 00. 


416 


TESTS OF HYPOTHESES AND DISTRIBUTIONS 
Making use of (4.1), the joint distribution (4.12) reduces to, 


4 
gamys Хр (32 2 Hy 
E TOn T enga 
where O < 0, 0, 0,0, < оо. 


Setting 1600030, = Vi, vavava = V3, vava = Vg and о = V? in (4.13), we get the joint 
distribution of V4, Va, V; and V, as follows 


- 4 
Op oth opo орнап do, ... (4.13) 
1 


16 yia И И AnS 
п Гот) Тт) ^ x» ( 1678 V3 n) ahs eM 
where ОЗУ, У, Vs and У, < oo. 


With the help of (4.2) we integrate (4.14) successively with respect to Vz and V, and obtain 
the joint distribution of V, and V, as follows 


дут Ve 
Tmi3)r8mj4) 3:9? ( 
where 0<¢ VY, Г, < х. 
Now, making use of either (4.3) or (4.5), we can integrate out Уз from (4.15) to obtain the 
distribution of V,. With the use of (4.3) we obtain the distribution of V,(= 4101000500) 
or 4v0, С, C, C, as follows 


Ws ) dV dV, as (415) 


yinagy ү Jr, 
T@m+2)T@m+4) D log Vi] | 2101 sr sura TuS «] 
к-т qd HA artis 
: rtggi * ih Е. api GTH US UBISI 


GH Hat ]} e «29 


where 0 < V, < оо, and у is the Euler's constant.. Or with the use of (4.5), we obtain the 
alternative form of the distribution of V,(=4V/0,0r4v4) or 4V/0,0,0,0, as follows 
ВИ RATE e (M 
TGm-r3) Fm 14) (2 Тат, (4.17) 
where 0 < V, < co and K,(2\/V,) is obtained as defined in (4.5). 
Forl = 5,6: А similar method, as used above for | = 3, 4, was applied for the cases 
1 = 5, 6 and the following distributions in the form of definite integrals were obtained. 
(i) The distribution of Vj(— 0, С, Cs C, С) is as follows 


games , Vnd y, e DE OVI By ip йү, o (4.18) 
Tmt Ти) ЕБ) у, о "e oe ( Vista ) 4 
where 027, < %. 
(ii) Тһе distribution for V,(= РАСА = C, О;) is as follows 


96т+1 утат; = 8 27. 27. -2r 4V AV. 
I2m-F2) Гет--4) F(2m-F-6) E 0 MER exp ( 57 Ey s) s 7 á 


where 0x V, < oo. 
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ENTROPY OF LOGARITHMIC SERIES рыны 


Ву GIFT SIROMONEY 
i Madras Christian College, Tambaram 


SUMMARY. The logarithmic series distribution was introduced by Fisher and Williams. We 
show that there are two types of logarithmic series distributions, which we shall call Туре I and Type II. 
` We evaluate the entropies of the two distributions and present the corresponding interpretations. 


Two types of logarithmic series distributions. The logarithmic series distribution was 
introduced first by Fisher (1943) in connection with some biological populations. Williams: 
(1944, 1947) continued to present many other data which conform to the logarithmic series 
type. We now show that there are two kinds of logarithmic series distributions. 


Type I. Consider a sample of flea-ridden mice, classified according to the number 
of fleas on each. This is an example of a distribution over the integers 1, 2, ... with relative . 
frequencies фу, Pa ... with р, = ft/[—In(1—f)]r. We choose to call this distribution 
Type I logarithmic series distribution. Other examples of the distribution include the 
distributions of wet and dry spells. (Williams, 1952 ; Ramabhadran, 1954). 


Let H, be the entropy of this distribution. ы, 
Then Ay = X p, ld p, 
fel 
where пес аига 


= BE and is small for large r 
r= mA} 3 
and 0<f<1. 
H, measures the average amount of uncertainty per mouse with reference to the 
number of fleas on a mouse and only flea-ridden mice are considered. 


Ў i | fria В Ват ваи] 
{ва-2} {=n py}  r(-mü-p 


H, 


T ы = (ld np 
tom (1 z) act шщ PBT E" jit bits, 


zar В" is uniformly convergent for 0 < à, < A < 9, < 1 where ô, and ô, are arbitrary. 
я 


in aa (5 гез), Id(—In(1— f) and aio $ id аА are all monotonic functions of J. 


Hence H; is а monotonic increasing function of ff. 
Thus f itself is a measure of the uncertainty associated with the number of fleas on 
a mouse from the sample and we shall call 2, the index of uncertainty. 
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Type II. Let f, represent the frequency of species of exactly r butterflies 

(or birds or insects) each, in a population of butterflies of a given locality classified according 
ul 

to the species. The Type II logarithmic series distribution equates f; to E к= 1;2... 


It should be noted that this is different from what was done in the case of Type I 
distribution. It is a distribution on the set of all species. Thus, if a butterfly caught belongs 
to a species with exactly r butterflies, its probability is 


E batterio = m UA) 
r/total number of butterflies Sf ap 
and there are E species with exactly r butterflies. : Let H, denote the entropy in this 


case. Then H, measures the average amount of uncertainty per butterfly with reference 
to the species. 


н,- #00 и (14) SP 20—f) i ( 20-2) 


ap ap 1 a ap 
= nef) (GP) AE Ce) ул) 


za Id rég 0-A $ grld. (1-+r) bits, 
tene 1 


eo 
» Ald (1+r) is absolutely convergent and 0 < р < 1. When а is fixed, H, increases with f. 


Let T, and 7, be the number of butterflies in two samples from the same population. 


"Then T, — т Е. and 7, = i. If T, < T, then д, < Ja Therefore Н, is greater for 
кы 


larger samples from the same biological population. 
I wish to thank Dr. W. F. Kibble of the Madras Christian College for his guidance. 
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SOME SAMPLING SCHEMES IN PROBABILITY SAMPLING 


By T. V. HANURAV 
Indian Statistical Institute 
SUMMARY. In this paper we give a sampling scheme for finite populations, resulting in a given 


set of Toles probabilities. Also, we give another scheme with the property that it can be stopped at 
any stage without distorting the proportionate values of the inclusion probabilities, 


1. INTRODUCTION 
We consider a finite population consisting of N units 
Uis Чоу...) Uys С 


A sample ‘s’ from (1.1) is defined as an ordered set of units from (1.1). А sampling 
design is а set “9 of samples ‘s’ from (1.1), with a probability measure ‘P’ defined on 
it, and is denoted by - 
D= DS, P). ESSE 
For any unit wu; of (1.1), let 

m= X P() fo l«i«N Е 

423 

where the summation extends over all samples ‘s’ of (1.2) which contain и, at least once. 
Similarly, for any pair u; and w,;, let j 


m= E Pls), for 1«i4j«N. S 
82 Wu, uj izj 


7; is called the inclusion probability of u; and my the joint inclusion probability of the 
pair и; апа и. Clearly, these are the probabilities that a sample ‘s’ from (1.2) contains 
the unit м, the pair of units w; and uj, respectively. 

When we have to estimate the total of a character Y for the entire population 
(1.1), we have practical situations where the values of another character X, which is 
closely proportional to Y, are available for each unit of (1.1). It is known that in such 
situations, the selection of a suitable sampling design which takes into account these 
values of X results in obtaining estimators of the total of Y which are better than those 
in which this information is not incorporated. А: commonly advocated procedure 
is to select the units from (1.1) with probabilities proportional to the corresponding 
value of X, with or without replacement, or systematically. These are known as 
pps sampling procedures. However, it was felt (Horvitz and Thompson, 1952) that 
the selection of the units such that the resulting ms are proportional to the correspond- 
ing X values, may even be better in some cases. The problem that arises now is the 
construction of a sampling design (or equivalently a sampling scheme, as pointed out 
by the author (1962)) which results in a given set of 7;'s. 
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This problem was first posed by Horvitz and Thompson (1952) where they 
attempted a solution. However, they limited their attention to the special case 
when each sample contains only two distinct units, and even in that case the method 
that they gave has rather severe limitations which they themselves pointed out. 
Further, the calculation of 7;'s from the probabilities of selection at each draw, is quite 
complicated when the samples are drawn without replacement (as they did) even when 
the sample size is small, as can be seen from their paper as also from the paper by 
Yates and Grundy (1953). 

In practice, it is necessary that we should not only be able to give an estimator of 


the population total of Y, say ¥ but also an unbiased estimator of its variance. For 
the estimators known to us, it is necessary that 


m>0 and m,>0 fral 1«ij«N. 
5-7 


v 
Hence we require a sampling scheme which while resulting in a given set of 7s, 
is suitable for an easy calculation of z;s. Certainly, a scheme consisting of indepen- 
dent draws is advantageous for this purpose. For, if the draws are independent and 
pf? is the probability of selecting и; in the r-th draw (1 < i < N), we have 


т = 1— È a-h i п) 0-5) б») 


where » is the total number of draws made. 


In Section 2, we give a sampling scheme which results in a given set of inclu- 
sion probabilities лг, which is valid for the most general values of п;з. Some of 
the advantages of this method are pointed out. In Section 3 we give a method of 
selecting units, one by one and independently, which has the advantage that we can 
stop at any stage we like, without distorting the proportionate values of 7;'s at each 
Stage. Some of these results were presented to the Indian Science Congress session 
at Bombay in 1960. Subsequently, Murthy (1960) gave another method of obtaining 
given set of л; в by systematic sampling, and Hartley and Rao (1959) also dealt with 
the same problem. However, their methods are different. 


2. SAMPLING SCHEME 

For the present we shall ignore the necessity of having m; > 0 for all i and j 
and explain a method of obtaining the given values 74, ..., ту for the inclusion pro- 
babilities. The slight. modifications necessary to ensure the conditions m; > 0, 
(i 27) will be given later. 

Suppose without loss of generality, that 

7T, > Ta > Tg >... > пу. vez (2.1) 

Tf this condition is not satisfied, we need only to rearrange members of (1.1) suitably. 
Let ki be the positive integer such that 


ky 
8, 2 Tmi L l< 8” 1. ... (2.2) 
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Choose the units w, in the first draw, with probabilities given by 
| л i ете 
200 = 1—8, if +=%,+41 s. (2.3) 
КОСУ Б! 
We now choose nl such that the probability of including uj ,, in ШШ 
of the first two draws equals лу. | во that we can dispense with this unit in the subse- 
quent draws. Clearly, we must have 


—(1—8 
= = ... (24) 
1 


As in (2.2), let kọ be such that 


(2) uri 
8. = 001+ Vig m € 1« SET, agr ... (8:5) 
We then set 
0 if lih 
пы+ (1—81) if i=h41 
8, 
p® = л if k42 <i< kitka 208«(2:0) 
1g if i= dy hel 
0 if i > b k4-2. 
We also see that we should have 
09 =0 fo 1<i<h+h, and n>. 2554 (2:1) 


The procedure can be continued till it terminates. If the last draw is r-th 


then we have 
k = kykt thy < М 


and there is а probability of 
€ —1—(p-mT;a4- т) 102.8) 


for not selecting any unit in ће r-th draw. At every other draw, the probabilities 


assigned to the units add up to 1. 

What is essentially done above is that we have ordered the units in a mono- 
л, and divided the population into a number of homogeneous 
strata, which, however, are not strictly non-overlapping two adjacent strata having 
possibly a unit in common. (Such units will be called junctional units henceforth). 


We then selected one unit from each of these strata. The sampling in one stratum 


tonic order of their sizes 


423 


SANKHYA: THE INDIAN JOURNAL OF STATISTICS: Serius А 


is independent of that in any other stratum. The number of units in each stratum, 
and their sizes are so determined as to make the contributions to total of Y, of each 
stratum, nearly of the same order. 

We notice that if u and и, belong to the same stratum and neither of them 
is а junctional unit, then пу = 0, so that we cannot obtain an estimate of V( 7) by 
our scheme of sampling. We can, however, modify as follows: we make the first 
draw with р) = 7,/v), where 


Yo а 
Setting d= m(1—7)+ (1-24) ... (2,9) 


it is easy to verify that if the probability of including v; in the second or subsequent 
draws equals 6;, then the total inclusion probability will be equal to z;. Hence, all 
that is necessary is to proceed sampling for the second and subsequent draws with 8/8 
as described above. 


However, an alternative and perhaps a better method would be to draw two 
independent subsamples, both of the same expected size. If then л; denotes the 
inclusion probability for и; in each subsample, it is easy to see that by setting 


m = 1—4/1—m; ... (2.10) 


the probability of including и; in at least one of the subsamples becomes equal to лу. 
Also, if пу, denotes the joint-inclusion probability of the pair (и, ш) in a subsample, 
then 
пу = (п-т) —п;2 15 (2.11) 
80 that, knowing пу we can easily calculate Ti. The лу’з can be calculated using (1.5). 
We shall examine, finally, the advantages of the above method. Even when 
the auxiliary values X’s are exactly proportional to Y, it can be seen that V( 7) 
where Ў is the estimator as proposed by Horvitz and Thompson (1952) has still a 
component due to the variations in the effective sample size v(s) Rao (1962). Hence 
a design should preferably have as small a variation in v(s) as is possible. Further, the 
stability of the number of distinet units in the sample is an important guiding principle 
in practical operations, where cost should preferably be stable. The procedure given 
first, is then a good approximation since repetitions of units are minimised, only the 
junctional units having a small probability of getting repeated. "These advantages 
are sacrificed to some extent in the two modifications given. It can be verified that 
there can be utmost 4 draws for each subsample, which give a positive probability to 
at least one of v; and uj, so that from (1.5), we see that the calculation of 7;; involves 
utmost 4 pairs of numbers. This fulfils an important practical requisite of quick 
and easy calculation of mys from ps, especially if we deal with large samples. 
Further, by choosing a scheme of independent draws, we avoided the necessity of 
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tedious caleulation of a large number of conditional probabilities of selection. Also, 
by utilising лге fully, we drastically reduced the labour of computing large number of 
probabilities. In fact, though we make ‘r’ draws (considering the first method given), 
out of the 7(N—1) independent probabilities to be used, we had only the (r— 1) june- 
tional probabilities calculated, all the others being either zero ora z;. When the 
population is large enough the gain in the labour of computations is quite 
considerable. 


3. A SEQUENTIAL SCHEME 


We shall now proceed to describe another scheme, which answers an impor- 
tant practical problem that arises frequently. In practice it is not the actual values 
of z;'s that are of interest to us, but only their proportionate values. Prior informa- 
tion on our auxiliary character, which is nearly proportional to the characteristic 
under consideration, being given, we would like to have a sampling scheme for which 

л; = KX, їй, 


so that K= = № =, gay, 


where X; is the value of X on u. 


1f we reasonably assume that the cost of inspecting а sample is proportional 
(roughly) to the number of distinct units in it, considerations of available resources 
fix an upper limit C, and considerations of desired accuracy fix a lower limit От, to the 
expected effective size of the sample, where, of course, €, < C. The value of vy will 
then be chosen to be some convenient number between О, and 0з, and this fixes К. 
However, there will be situations when the соз рег unit for the inspection of a sample 
cannot be estimated sufficiently accurately beforehand. Or there may be sudden 
changes resulting in a considerable difference between estimated costs and incurred 
costs. In such cases, it is desirable to have a sampling scheme which allows us a 
flexibility in the choice of number of units to be selected in the sample, even after the 
inspection of the sample, unit by unit, has started—and this it should be possible to 
do without distorting the basic property of the sampling design. In cases where we 
have information on an auxiliary character, we can choose as the basic property of the 
design, that it should have the inclusion probabilities 7;'s proportional to the given 
Хз. We now give a method of sampling which completely solves the problem 
in almost all important practical situations. This scheme consists of independent 
draws such that at the end of every draw (up to a certain stage), the inclusion probabi- 
lities attained till then, are proportional to a given set of values л, (1 < $ < М) say. 
Let p(? be the probability of selecting w; in n-th draw, and z(? be the inclusion 
probability of и; at the end of n-th draw. We want to have 


п® = Ку; i—12,.N; n—L2,. Hey (3,1) 
y 
where K„’s are constants. We can clearly assume that X п; = 1, without loss of 
ia 


generality, and choose K,'s appropriately. At the end of » draws we have ч 
af? = 1—(1—р{9)(1—р@®)... (1—2). E92) 
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Further, we have the conditions 
ŽP = 1, (od 2 hc (9:8) 

since the probabilities should add up to 1 at each draw. Setting n = 1, we have 
from (3.2) and (3.3), 

K,=1 and y» =л; (1 i« М). ... (3.4) 
Setting n = 2 in (3.2), we have for all ©? 

(1—22)1—2) = 1— Ky, 

giving pP) = n(K,—K,)/0— Кут). © (3.5) 


We then have from (3.3), 


T x Ti us 
K,— K,+{ D Tx Е 2. (3.6) 


Similarly, setting n = 3 we have for 1 < < N, 
(1—{9)(1—®)(1—р®) = 1— Кул, 


ae, 3) — л(К—К›) 7 
giving p$ 1 Ky Я © (87) 
P d (3.3) th. iv К„= К aras а КЕЗ 8 
nd (3.3) then gives 3 х; E (3.8) 


In general (writing Ky = 0 conventionally), it can be shown that 


T; К 
= us el (3.9 
Kyu = Ky {х iE (3.9) 
and w — UG — K, у) 2. (3.10) 
4 1—K, т; 


for 1 <i < М and for n upto a certain limit which we shall presently investigate. 


From (3.1) and the assumption that У л; = 1, we see that the expected effec- 
-tive sample size after the n-th draw, which is equal to У 7M, is equal to K,. Hence 
-Kp increases with n. When n tends to infinity we expect to get all the units of the 
population, so that 
; K, >N as п-оо. 
Assuming as before that 7, > лу >... > ту, except in the degenerate case when 
all inequalities become equalities, we see from the assumption X7; = 1, that 
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Hence there exists an integer и, such that 
mK, « 1 and 74K,,41 > 1. er 80) 


The procedure given above then yields in the (n)+-1)-th stage a negative pro- 
bability for u, as can be seen from (3.10). Hence the procedure is valid upto the 
n-th draw only. However, we shall see that this is not a serious limitation in practice. 
If we wish to sample an expected effective proportion Р of the population, we wish 
to continue the procedure upto n-th draw, where : 


Kn > № > К, 1. (819) 


Since we can continue the procedure up to n-th draw, as given by (3.11), we can 
have n draws if and only if 


gui өл 

In practice, f will be of the order of 5% to 10% or even less. Hence (3.13) is 
satisfied if the maximum of Ņ positive quantities whose sum is equal to 1 is less than 
ten times the average. In practice this condition will in general be satisfied. However, 
if there are some units with exceptionally large 7;8 we can perform independent 
binomial trials with the corresponding 7;'s as the probabilities of successes, and after 
including the units for which successes are obtained, in the sample, we can eliminate 
all these units with large 7;'s and proceed as before with the remaining units. А simpler 
and perhaps more practical method is to split each such large unit into 2 or more sub- 
units each having a small value of z; and treat the selection of any single subunit 
as resulting in the selection of the corresponding entire unit. 


We see then that at the end of every draw, the inclusion probabilities attained 
are proportional to given values of an auxiliary characteristic. If after drawing a 
sample of size n we want to increase the sample size, all we need to do is to perform 
the (n+-1)-th, (n++2)-th,...draws with probabilities as given by (3.9) and (3.10), until 
we get the required size. Similarly, in cases where we are not quite sure of the 
reliability of our estimate of cost per unit of inspecting a sample, we should make 
some draws to get the sample and then inspect the units of the sample, one by one in 
the order in which they are selected—at least after a critical stage is reached. If at 
any stage we find that our resources are drying up or that sufficient accuracy is already 
expected to have been attained, we terminate the inspection at that stage and note 
the value of ‘w the number of draws made in all, Тһе calculations involving 74's are 
mathematically simple but computationally not as simple as they are in the 
procedures given in Section 2, but at times the labour is worth the advantages we 
are gaining. The procedure has attractive mathematical tractability (which is 
absent to some extent in the case of schemes given in Section 3) and allows us to 
draw samples of size upto 10% of the population and very often even more, 
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ON ‘HORVITZ AND THOMPSON ESTIMATOR’ 


By T. V. HANURAV 
’ Indian Statistical Institute 


SUMMARY. In this paper we consider the problem of finding optimal sampling designs for the 
use of ‘Horvitz and Thompson’ estimator (1952) to estimate (unbiasedly, unless otherwise stated) the popu- 
lation total of a character Y, when auxiliary information on a correlated character X is available for all 
the units. Since there does not exist a design in which the variance is uniformly minimum, the optimal 
designs are obtained by minimising the expected variance under a realistic superpopulation set-up. These 
turn out to be designs in which the effective sample size is constant for all samples of the design. It is 
further proved that with the same criterion for comparison, the Horvitz-Thompson estimator for these 
optimal class of designs is uniformly superior to Des Raj’s (1956) estimator-in the symmetrised form 
for the sampling designs considered by him when the average effective sample size is 2, 


INTRODUCTION 
Consider a finite population, consisting of N units 
1435 Us; «. «5 Uy: ende un 


Let Y bea quantitative character, taking the value y; (which is unknown, a priori) 
оп w, (1 <1< М). Let D = D(S,P) be a sampling design consisting of a set S of 
samples s from (2.1), with a probability measure P defined on it. We define 


pee Pe ae) о са) 


Du, 

i 
(where the summation extends over all samples в of S that contain т), to be the 
inclusion probability of u; in D. Similarly, we define 


m= X P, Q«izj«N) NAB) 


to be the inclusion probability of the pair (и, uj). An unbiased estimator of the 
population total 


T=% (1.4) 
= Ei Yi ... D 
as proposed by Horvitz and Thompson, is then given by 
VUE d 05) 
ies Mi 


where the summation extends over all distinct units и; belonging to в (i.e., we ignore 
repetitions). The variance of T, is given by 


S x je x ied 
Үй у= y үс )+ asp }. 24 (1.8) 


In many situations of practical interest, we have а priori, the values 2;, which 
another quantitative character X, highly correlated with Y, takes on и; (for 1 & i <N). 
In such cases, this information is used to construct sampling designs and esti- 
mators of, T say, which result in a greater precision than those which do not make 
use of this information. Examples of such procedures are pps estimator, Des Raj’s 
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estimator, ratio estimator, regression estimator and so on. The amount of gain in 
precision due to these estimators depends on the degree of correlation between X 
and Y, and to assess this more fully, one needs to assume some broad statistical 
relationship between X and Y, so far as the population (1.1) is concerned. 

When the value z; is known for the unit tų, it is reasonable to assume, in many 
practical cases, that the corresponding y; (which, however, is unknown), is the 
outcome of a random variable Z whose expectation is proportional to v; and whose 
variance is either partly or fully unknown. The realised value y = (01, ..., уу) cn 
thus be considered as the realisation of N-length random vector from a super- 
population. This concept has been introduced by Cochran (1946) and since then has 
been successfully used by many others. We notice here that we tacitly make this 
assumption when dealing with pps estimator, ratio estimator, and almost all estimators 
used in designs of varying probability sampling, while when using regression estimator 
we make a similar but slightly weaker assumption. We explicitly formulate our model, 
writing E, and V, to denote the conditional expectation and variance, given xs, 
thus : 
; Filo) = а а E07) 
and (уаз) = 04 (вау) es (1.8) 
where, a and т?'з are unknown constants. We also assume that y; and y; are inde- 
pendent for all i 5 j, for given x; and æ}. In particular this implies that 

: (ууа; and 2) = axt. в. (1,9) 


2. OPTIMAL DESIGNS 

Under the assumptions (1.7), (1.8) and (1.9), we shall proceed to find the 
sampling designs best suited for the use of 7, as given by (1.5), to estimate T. Тһе 
_ criteria that we choose for the best are (1) unbiasedness and (2) minimum variance. 
Clearly, the increase of sample size, while increasing the precision of estimates, increases 
the cost also. Assuming that the cost of drawing and inspecting a sample s is 
proportional to the number of distinct units in the sample (which we shall call the 
‘effective size’ of s and denote by v(s), henceforth), we search for the best designs 
to use (1.5) in the class of all designs having a fixed given value for v(s) for all s in the 
design. However, we may relax the later condition to include designs in which v(s) 
can vary from sample to sample by demanding that the expected value of v(s) be equal 
to a given value. This means that the expected cost of sampling is to be fixed, which 
is а reasonable condition unless the variation of v(s) in our optimal design is too 

large. We note that 


á Ew(s) = E v(s) P, = =з, вау. ... (2-1) 
Given the auxiliary information on X, we shall consider only those designs for which 
the inclusion probability 7, is proportional to 2;(1 < i < №). (This is not only reason- 
able but is probably better as hinted in Section 3). Given the expected cost, v, is fixed 
and our domain of search then becomes the class of all designs in which 

m=  (1<i<¢ WN) e. (2.2) 
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where c is a constant determined by (2.1) thus 


c= yl = say (2.3) 
DE 
i21 


It is seen that in these designs, the variance of (1.5), as given by (1.6) depends on y;'s 
and z;;s. Hence it is the set of 7;/'s that are at our disposal in the choice of optimal 
designs. If there exists a subclass of the above designs, for which (1.6) is minimum, 
uniformly with respect to all possible values of y = (01, wy Yy), clearly, we have 
obtained the best designs. But we prove that (1.6) cannot be uniformly minimised 
with respect to y;s. Setting 


T 1—т, AX. уу, 
= паха. EX 4 7i (пулт). 
Q 2, ий ( "x )+ VIRAL ҮДА 5) 


which is continuous in y 's the conditions for Q to be minimum require 


80 = Y; (1—71) Æ $ y; Tg T0 — 0 
бу; т; =“ ЛЛ 
Пад 

for all {апа for all sets of ys. Clearly, this is violated by setting y; = 1 for any fixed 
= 0 forj Æi. Hence we proceed to do the next best, which is to see whether 
ioned over given vs can be minimised, uni- 
formly with respect to all values of z/'s, a and o?s. When even this is not possible, 
we then have to restrict our population of дв to some specific models. However, 
we shall prove that there exists an optimum class of designs with given 7;'s for which 
the conditional expectation of (1.6) is uniformly minimised. From (1.7), (1.8), (1.9), 


(2.2) апа (2.3), we have 


i and y; 
the expected value of (1.6), when condit. 


X /1—7; NO улутт; 
Е m i 2,2 2 ij 473 \ 42. 
> p Ja a1--o?) + ХХ үс A Ja aue 


1 
-z( 1—т )( eot) хрип). 0 09 


s, when and only when 


Hence F, V(Î,) is minimum, for given values of z;'s, a, c and a7 
(2.5) 


XX m, 
- Lad : 


is minimum. Defining auxiliary random variables R, by 
1 if wes 
РА = 2 
0 otherwise 
we have п => Ry P, 
seS 


Ty = z Ry Ry P, 
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у 
and уз) = Ж Ry = У Rh. 
i=l i 


Hence Dag = p ü 2 R,; Ry Pj 


ll 


x (ZR, R, P,} 
8:5 


E {v*(s)—v(s)} P, 


1 


g УЕ V(v(s)) — vo. ... (2.6) 


Hence, for a given vp, the expected sample size, (2.6) is minimised when V(v(s)) is 
minimised. In other words, the design should contain, as far as possible, samples 
of the same effective size. When vp is an integer 


min V(ws)) = 0 


while if w-—[w]l-f 0<f<1, 
[vo] being the greatest integer not greater than vo, 
min V(v(s)) = f(1—/) б (2.7) 


since we should have 
v(s) = [м] with probability f 
[vo]-++1 with probability 1—f. 


In practice, the considerable degree of freedom we have in the choice of v, 
allows us to have у, as an integer. Even otherwise, (2. 7) is negligible in comparison 
with (vj—vo) for practical values of vọ. 


The practical problem of the actual construction of these designs is not solved 
fully, but the author (Hanurav, 1962)* gave a method of selecting units from (1.1) which 
results in prescribed general values of лге. It is pointed therein that the resulting 
design has a very stable value of v(s), serving as a good approximation for our purpose. 
We therefore assume that the minimum of (2.6), viz., (vg—vp) can be closely attained 
in practice, so that for purposes of comparison we can take 


^ Е 2 N N п. 2 
min ВУ) = 5 È т) È (155) а 08—25 mm) 
2 
= (uzn +E (on 08 —vo—(8—E 0) 


= х(! ="); о? ... (2.8) 


п; 


*Published in the same issue, pp. 421-428. 
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3. COMPARISON WITH SYMMETRISED DES RAJ'S ESTIMATOR 


For the purpose of this section, we shall make one further assumption beside 
(1.7) and (1.9), and this is regarding the conditional variances с?з. A commonly 
advocated assumption is that o?’s are all equal but unknown. However, in many 
cases of practical interest (especially when the variates Y and X are positive as is the 
case in most of the sample surveys) it is more realistic to assume that the conditional 
coefficients of variation are more or less same for all units so that the conditional 
variance increases with the conditional mean. We explicitly write this as 


Үү(у,/®) = К. ax? = о? ase (8.0) 
where о? is an unknown constant. 
We now compare the estimator 7, used in the optimal class of designs derived 


above, with the symmetrised Desraj’s (1956) estimator, when уо = 2, under the 
assumptions (1.7), (1.9) and (3.0). We briefly describe this later estimator, 


‚ We draw a sample size » say, without replacement. At each draw, the 
probability p; of selecting и; is proportional to æ; if it is not already selected. Here 
р; will be taken proportional to x; where x; has the same meaning as in Sections 
land 2. Ши, is selected in the first draw, the probability of selecting и, in the second 
draw is 


SU MEA RAUS! 
090 = 41-8. His 
0 if ji. 
Similarly, in the third draw we have 
PO no LS > 4: чер 
E ВЕРЕ ИИ 
0 otherwise 


where the notations are clear. An unbiased estimator of 7 in such cases, as given 
by Desraj (1956) is 


А, LZ n 
fumo E (81) 
Yin 1 Р E ) 
where ta = Yi id, T xd Pi, Pigs =, 1), 
n 


Ui s Uis 5 Ue,» Mi, being the units successively obtained in the sample, the suffix 
1 2 n=: n 

“asym” denoting that the estimator is asymmetric in the observed values. 

Restricting ourselves to an important practical case of n = 2, we write (3.1) thus 


2, unc кей (1—p) „. (8.2) 
2 
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where и; is selected in the first draw and и; in the second draw. It is well known 
that this estimator can always be improved by taking the weighted mean of different 
asymmetric estimators, for given unordered sample, the weights being the respective 
conditional probabilities of obtaining the ordered samples given the unordered sample 


(Halmos, 1946). Denoting this improved symmetric estimator by fs we have 


ЕН yi Yi : ae 
П fü—p-9 —2)-3 EC (3.8 

рр K Pi) m (1—2 oa | (3.3) 
which is symmetric in $ and j as it should be. ~ We have 


Vit) = EX pa, (zen) ив], 


2—pı—Pj/ L De. Dj 
1 
EH (Р-Р) Y x Е 
рур (Ш) хуи | AA N, b (34 
inl Pi [35i быя ane } Е 


In order to compare T, and Т, we should take пв for 7, such that vy = 2, so that the 


N 
expected effective sample size remains the same in both cases. Since У р; = 1, and 
both p,’s and пгз are proportional to 208, А 

т; 
R= 5 ... (3.5) 
so that from (1.7), (1.9), (2.8); (3.0), (3.4) and (3.5), 
А У 2 n3(a?+o%) п(2—п;—п;) а? 2—7,—7; 
ВУ.) = 2 x T8 уу = 
dL) жїл e {х 2(4—л;—т;) | nS mn [ 4—7,—7; | 
a 3 2—7;—T, 
E d por хааш Es -(3. 
Cody АШ | Soe 


We assume that the minimum variance of 7', as given by (2.8) can be closely attained 
| and that we can neglect the component of variance arising due to the slight variations 
in the effective sample size. Hence we shall proceed to compare (3.6) and (2.8). 


Же have 
E,V(?,)— min E,V(?,) 


g? myr(2—7,—7) 
SUA эзуу ШШ en сеш MN 
al м’ (тп) E pu ] 


= 20% [1 xx Tj; ] 


e ij (4—7,—7,) 
20° PP, 

SOLUS RET дуу: Юз с}, (87). 
с? [ ij Сл 
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Let Хь «+ Py) = EX B 
эз (2—p,—p) 
When р = E for all i, 
qs. 1 1 
learl a оа E 
x id kg? y bd 5’ 


во that in this сазе 

E,V(T;) = min Z,V(T;). 
In order to prove that 

B,V(P,) > min В.Ф, 


we shall prove that x is actually maximum when all its arguments are equal to 1/N. 


We have the restriction 
Ep = 1 


on the p/s. Introducing the Lagrangian multiplier A, 


let IX SD Аи), 
9 y wj (2—pi—» (Rev 


We can verify that at the point where p; — pe 1«i« N, we have 


M _ 9 


др, 


у —N(QN-1) _ _ 
3p И ИУ 


му _ МАМ 


= за’ 
др — КУ dori 


so that the Hessian of y is given by the NxN determinant 


PE ae nta 

bye 6 ba 
Ay) = 

Е ва 


The value of the r-th order principal minor of H(y) is 
(—1 Hb, +b) [n 1)53—5]. Е) 
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Since b, < бү, it follows that (3.8) changes its sign alternately as 7 increases 
and that it is negative when r = 1. This shows that y attains its maximum when 
all ps are equal to 1/N and hence it follows that 

min E,V(f;) < ЕЁ, V(f,) 
which shows that when we average over the conditional variations of y,’s, f, is 
uniformly superior to the genis an Desraj's estimator, when samples are of average 
effective size 2. — 3 

Remark: The above result justifies the opinion that when auxiliary 
information X of the type discussed above is available, it is sh ratio to choose the 
sampling scheme so as to make the inclusion probabilities z;'s proportional to ws 
instead of choosing the design with probability of selection in each draw propor ode 
to түз. We note the assumption involved in taking (2.8) to be the minimum attainable 
variance when we use (1.5). This amounts to assuming that the given set of лч can 

_ be partitioned into two subsets such that in each subset, the total of the п; is exactly 
equal to unity. This assumption though need not hold good in general is a good 
approximation in practical cases especially when 7;'s are small quantities as is usually 
the case. It also seems reasonable to conjecture the validity of our result even when 
Уо > 2. 
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